Thursday, October 23, 2014

Bitnami Supports New Amazon Cloud Region in Germany

Amazon just announced the addition of another region for the Amazon Cloud. This new region, eu-central-1, is located in Germany! We have been working closely with Amazon previous to the launch and we are excited to announce that all of the applications on the Bitnami Library are available immediately for this new region. You can find the new AMIs in the Amazon catalog as well as in the cloud tab for each one of the apps in the Bitnami web site.

Support for the eu-central-1 region is also in the works for Bitnami Cloud Hosting and will be released shortly. Bitnami Cloud Hosting is a service that simplifies the process of deploying and managing the Bitnami library of applications and development environments in the cloud. It offers dynamic deployments, automatic backups, monitoring and other features that make it easier to run applications in the cloud. Check it out!

Friday, October 17, 2014

Akeneo added to Bitnami Library!

Akeneo is the latest winner of our monthly app contest, and is now part of the Bitnami Library! We are happy to announce that Akeneo is now available to download on Bitnami.

Akeneo is a Product Information Management (PIM) application designed to simplify your product management processes with a tool that helps centralize and harmonize all the technical and marketing information of your catalogs and products.

Thanks to their successful effort in encouraging their community to vote in the the Bitnami contest, Akeneo is now ready to install in a few clicks using the Bitnami installers, available for Linux, Windows and Mac OS X, Virtual Machine images (VMs) and cloud images for the Amazon EC2 and Azure clouds.

Akeneo PIM dashboard

Do you want to quickly check out Akeneo? You can launch a free cloud demo server. By clicking the button below, you will have your own Akeneo instance running for 1 hour.
We also had  the opportunity to interview Frédéric de Gombert, CEO of Akeneo, who was kind enough to answer some questions about the project.

How was the Akeneo project started? What are the origins of this project?

The Akeneo journey really started in Las Vegas during the Magento Imagine Conference in May 2012. Yoav Kutner (former CTO & co-founder of Magento) and I were talking about what was really missing for merchants. We quickly agreed that CRM and PIM were major pain points for our clients. Not because there was no existing solution in the market but because those solutions were mostly closed-source, expensive and rarely designed for e-commerce needs. Six months later and with the help of a dream team including two other co-founders (Benoit & Nicolas), we were officially founding Akeneo. Our mission : building an open and intuitive PIM to help marketers struggling with spreadsheets and/or archaic tools. (Yoav also founded in the same time OroCRM but it's another story!)

What is the main goal for Akeneo?

At heart, Akeneo is an intuitive and super connected product information management software. But it is also a productivity tool : it means that our main goal is making product management much more efficient. Clients can centralize and push a product to a new channel much more quickly — on average, it takes 60 to 80 percent less time.

Which projects or organizations are using Akeneo currently? What kind of projects do they use it for?

We currently have more than 7 000 live installations of Akeneo in the world. It's a good start considering that the first stable version has been released in March 2014. We have very various kind of customers : fashion or luxury brands like Lancaster or Charlotte Olympia, large retailers like Auchan (largest supermarket chain in Europe with more than 1 600 supermarkets in the world), real estate companies, manufacturers, ... Every company selling products - virtual or physical, online or offline -  can find a good usage of Akeneo!

What do you expect will be the main benefits of having Bitnami packages available for Akeneo?

Two main benefits : ease the installation and evaluation process of Akeneo for non technical users and help them to find an efficient and cost effective hosting solution if needed. 

Would you like your favorite app to be part of Bitnami? Be sure to suggest and vote for it in our monthly contest

Bitnami Open Source Leaders Interview Series: Sytse Sijbranij from GitLab

Gitlab leads by example with their passion for creating open source collaboration tools, with the ability to do everything on code. As part of our Open Source Leader podcast series, we interviewed Sytse Sijbranij, CEO of GitLab, to learn how they maintain their open source community and what will be next for their tools.

Below is a sample of the topics we covered:
  • Why Gitlab?
  • Who should use it?
  • What is the relationship between Gitlab and Github?
  • Where is the best place to run Gitlab?
  • How does the Gitlab community work? 
  • Where is is Gitlab going next? 
You can launch a GitLab application or stack to the cloud with Bitnami for free, or download any of our free native installers or VMs to run the software locally.

Stuart Langridge:       This is the Binami Open Source Leaders Series of Interviews.  I’m Stuart Langridge, and I’m talking to Sytse Sijbranij of GitLab.

Sytse Sijbranij:            Hi Stuart thanks for having us.

Stuart Langridge:        No problem.  So Sytse, you’re CEO and co-founder of GitLab, yes?

Sytse Sijbranij:            Yes. That’s correct. 

Stuart Langridge:        So tell us, what is GitLab?  

Sytse Sijbranij:            GitLab is open source software to collaborate on code.  It means you can download a package, install it, and you’ll have version control, issue management, code reviews, a wiki, all ready to run within your organization.

Stuart Langridge:       So you could think of this essentially like a self-hosted version of GitHub, yes?

Sytse Sijbranij:           Exactly. 

Stuart Langridge:       Since we mentioned GitHub, the elephant in the room. Obviously there’s a big advantage in that I’m running it inside my organization, so it’s private if I want it to be.  If you look at this field there are an awful lot of different packaging attempts at this, both hosted, something like GitHub or LaunchPad. Talk about why GitLab’s better than the competition, what you do really well, and why people would want to go with you.

Sytse Sijbranij:            A few things we do really well is that you get a lot of possibilities to modify it as you see fit. With some of the other competitor’s products, you get a black box virtual machine that you don’t even have proper access to.  At GitLab, you can use it anyway you like with nginx or with Apache. We’re the only product in the whole marketplace, commercial or non-commercial, that you can actually run in a clustered configuration with multiple application servers.  GitLab is really complete. It is all about integration; integration with other issue trackers such as Jira and Redmine. It’s just a very polished product compared to a lot of other open source projects that are out there.  It really contains everything you need. We really listen to the community and our clients, and there are a lot of features that enterprises find handy in GitLab.  For example, not only do we have public and private projects, but we also have internal projects; projects that are visible only to people who are logged in. 

Stuart Langridge:        Even within the organization, you can lock off certain projects to certain development teams and so on?

Sytse Sijbranij:            Sure, that’s a feature in many projects, but these internal projects are visible to anybody who has a login.  Imagine as an organization, most of your projects are only for internal use, but you’ll have some public projects that you released out to the public and have them contributing back. Within the company you probably want to collaborate on most projects, but you have to first be added to a project before you can see it, and that’s a hindrance compared to the open source workflow. This is not a problem if you’re with five people or ten people, but it starts becoming a problem if you’re with 50,000 people which some of the organizations running GitLab are. Now you can say this is an internal project, and everybody in the organization can see it and can try to contribute back. With this, you get the open source workflow within this large company.  It’s also called inner sourcing, working with an open source workflow within a large organization.

Stuart Langridge:       You spoke about large organizations using GitLab.  Are you targeting particular types of organizations or companies?  Give us a sense of the sorts of people who are running GitLab now and the sorts of people who you’d like to be running GitLab in the future.

Sytse Sijbranij:           We’d like everybody to be running GitLab, so we’re not targeting anything specific. I cannot talk about some of our largest and most impressive, but some of the companies running GitLab are Redhat, Electronic Arts, NASA, Comcast, IBM, SpaceX, NASA, Qualcomm, SOHO, AT&T, but also universities, such as Michigan State University, University of Texas at Austin. Also non-profits such as Interpol, the police agency, International Center for Missing and Exploited Children. It’s really all over the world.  In advertising, publishers are running it, but also in research, the Fraunhofer Institute is running it. We like that diverse of a group, it’s great for GitLab to be everywhere. 

Stuart Langridge:       What about if I’ve got a small development team?  Is there a size of organization, a size of project below which it’s not worth running GitLab, or will it be suitable if I basically have one small internal project and only a couple of developers?

Sytse Sijbranij:           Yeah, for sure it’s suitable then.  Of course your logo won’t end up on our homepage, but many people run it with a really small organization. They just start a Digital Ocean server, and they get up and running within a few minutes. People even run it for themselves on their own server back at home, for their own projects. They use GitLab to have a visual overview of all the repositories as a kind of remote backup. They use the code review things to review their own changes.  So they run it on a home server or even on a raspberry pi.

Stuart Langridge:       GitLab is obviously set up so we can be scaled out to as large as you want.  How does that scaling story work technically? If you’re using Bitnami for example, to deploy into EC2 or Rackspace or something like that, how are you set up to scale GitLab to the size that you need it to be?

Sytse Sijbranij:            If you just provision a decent Amazon server, let’s say C1 medium, you can size it up to thousands and thousands of users. For years, GitLab commerce has offered a service with more than 10,000 people, which has been running on a single Amazon server.  So you can scale on the single server for a very long time, and we do strongly recommend that. If you want to scale out on Amazon, you could have one file server. You’d attach an EBS drive of one terabyte, use that as an NFS server, and then have a couple of application servers in front with an elastic low balancer and use Amazon RDS with PostgreSQL or MySQL to store the database. One limit on Amazon is the one terabyte limit for EBS drives. So we have experience with striping across a couple of those volumes with LVM, so you can scale to multi-terabyte setups. 

Stuart Langridge:       Obviously in order to set up that kind of environment, you need quite a lot of technical knowledge. What would you consider to be the level of technical knowledge to run GitLab - not necessarily in a very complicated, multiple-striped, multiple cloud deployment -  just internally in my organization? Is it something where you’d need a very competent system administrator?  Is it something that can be set up and left to run relatively easily? 

Sytse Sijbranij:            I think it can be set up and run relatively easily, and I think the Bitnami packages are a good example of that. People just set it up and those keep running. There are also a lot of other options to install GitLab. What we focused on is making it very easy to install even if you download a package that installs in two minutes. It doesn’t require any rails knowledge, none whatsoever, and then you’re still able to upgrade with a single command. We’ve really focused on installing and upgrading over the year last year, and that’s become so easy. As long as you’re comfortable with the command prompt on a UNIX server then you can do it, and with Bitnami it’s even easier. 

Stuart Langridge:       If you think about your large, diverse user base, do you get a sense that most of them are deploying on hardware inside their own company or are most people deploying into the cloud, digital ocean servers,  or is it quite evenly balanced between all those different ways?

Sytse Sijbranij:           Yes, quite balanced.  Some people like digital ocean.  For a small team that will be a common way to start.  If you’re an individual probably your home server. If you’re a larger organization, sometimes they have hardware at their sites. Sometimes they have hardware and data centers, or sometimes they have a private cloud. People, even if they’re comfortable with the cloud, still want their own GitLab installation. They don’t want to use a SaaS, because they want to inspect the code, be able to modify the code, customize the installation to fit their preferences, connect it to an LDAP server hosted behind their VPN, and run their logging and intrusion monitoring software on it, so there are lots of reasons to run it in the cloud but still run it on a server that you control. 

Stuart Langridge:       GitLab is itself hosted on GitHub, so we’re obviously happy to work with existing SaaS solutions, but obviously again there are benefits to running your own stuff in house. There’s quite a divergence between strongly open source bits of software, and a proprietary commercial SaaS alternative which might have the features you want, but you can’t inspect the code. Talk about how GitLab is trying to find a balance there.

Sytse Sijbranij:           We try to be really pragmatic about these choices. That’s one of the reasons that we still have a repository on GitHub, and the reason is that many people have GitHub accounts. We don’t want to miss out on contributors who want to contribute via that channel. I do want to mention that the canonical source of GitLab is available on, so that’s where you find the real version, but what is the real version in a world of distributed version control. So we’re quite pragmatic. If people are comfortable contributing from a certain platform, we don’t want to deny them the opportunity. We want to make software that works. It’s open source, but it’s not GPL-licensed. It’s MIT, so anybody can do anything they want with it, and we believe that’s real freedom for people and companies. They don’t have to be worried about anything. We want it to be a very polished product that you can use without any problems. We spend a lot of time fixing all the small things such as fixing the UI or having good documentation. That’s why we need income as a company so we’ve created a version of GitLab that’s called the Enterprise version that focuses on organizations with 100 people or more using GitLab. It offers some extra features, and it’s how we generate revenue and also grow as a company. 

Stuart Langridge:       That makes sense. GitLab is primarily web-driven, yes? You use the web interface to get everything. What are your policies on things like supporting modern browsers, on working well on people’s mobiles, or using responsive designs? Are you using the latest cutting edge HTML for that and cutting off older browsers, or does it work all the way back down to Lynx?

Sytse Sijbranij:            No, it’s web-driven of course. SSH connections are supported, but you mainly interface with it via web browser. We do have strong support for mobile. For example, everything is tested on iPad before it ships, so the whole user interface works very nicely there. We do support the latest version of Chrome, Firefox, Safari 7+, Opera, and Internet Explorer 10+. We expect developers to have relatively recent browsers and also if they have a mobile phone that it has a substantial screen area, but we definitely want to support a mobile.  We’re on the latest bootstrap, and we’re on the latest Java script frameworks so that people can have good experience there. 

Stuart Langridge:       What’s the GitLab release strategy?  How often do you put out new versions and what’s the cadence of that?

Sytse Sijbranij:            We have a pretty awesome cadence we think. We release once a month on the 22nd of the month, and we have never missed that release window since we started releasing in 2011. People are very sure that when there is a new GitLab release and we only ship what’s finished, so we have a pretty high-quality standard. We have a merge window that closes on the 15th, and then we do a Q&A for a couple of days before we release it, and people look forward to it. It’s always a bit of a celebration. We even do sometimes a video call or something like that to celebrate the new release.

Stuart Langridge:       How do upgrades work?  Do I just SSH into my server and do git pull, or is there some kind of managed upgrade service? 

Sytse Sijbranij:           Yeah it can be like that. If you installed it by hand, you can do the git pull, and look at the instructions for what else you should change. We also have an upgrade script, so you don’t have to type as many lines. Many people nowadays use the packages and that means you’re downloading the package and just do an opt install, and that manages the whole rest of it.

Stuart Langridge:       So you’re providing Debs or RPMs?

Sytse Sijbranij:            Exactly, debs and RPMs for CentOS, Redhat, Debian and Ubuntu for multiple versions. We make those for Omnibus, which is a very interesting piece of technology from Opscode, and they allow us to make a package that has everything, that not only has the rails components but also all the gems, all the rails assets precompiled, and all the native compilation for the rails gems. Also Nginx, Postgres, Unicorn, everything you need to run a GitLab server. It prevents a lot of problems and it makes things a lot easier, especially upgrades.

Stuart Langridge:       I love the idea of people having little parties when you do a release. So that leads us on to talking about the GitLab community. Obviously you’ve got a contributor community, but you’ve also got a user community. How much do they overlap and what’s going on there?  If I’m a GitLab user, if I’m using it in my organization, would I want to get involved in the community and what would I get from that?

Sytse Sijbranij:            Certainly it’s up to you if you want to get involved. I think the vast majority use GitLab and are very happy using it and that’s great. We love our users.  You don’t have to contribute anything back. But it might be fun to explore, and most other people start offering small suggestions. They have a feature request they’d like to see. They contribute to our feature request tracker, and sometimes they find a piece of text or some small thing that’s broken and they contribute it back, and that’s how they get going. We have a core team of more than ten people, most of them not from GitLab BV, but just the rest of the community. We also have some people out on the issue trackers that are called merge marshals, so they review all the merge requests that come in and help people get them, so they can be merged.  More than 600 people have contributed to GitLab so far. 

Stuart Langridge:       Do you find the GitLab community, the GitLab user community are helping one another, or is it kind of hub and spoke where everyone talks to you and you help them out? 

Sytse Sijbranij:            No, it’s certainly people talking to one another. I’m very glad not everybody talks to me, because I wouldn’t have any time left. People find one another, and there’s some people who like to get in our chat room and are very active there. There are some people who manage IRC and there are all these different preferences, and people find one another there. There’s people on GitHub and GitLab talking about contributions. There are a lot of ways, and people are even making money with GitLab. There’s, where you can spin up a GitLab CI server, as a managed instant, which is a great service and it’s run by George. We just really welcome people also, not only doing volunteer work but also trying to make money with GitLab.

Stuart Langridge:        If I’m looking to do something like this within my organization and I’m looking at GitLab and at the competitors, what sorts of questions should I be asking to help me decide between them?

Sytse Sijbranij:           I think you should think about what kind of workflow you want. You should realize that the whole open source workflow is what we do. We’re doing merge requests, proposing code, and that’s what makes Git so powerful, so that’s why people have switched to Git. If you’re doing Git without having proper software, without doing code reviews, I think you’re doing it wrong and you want software that supports that. There are lots of things to look into, but I think that the advantage of the GitLab is that it's open source version is already very powerful, it’s not limited in any way. You can grow to an installation of many thousands of people and just run it without owing anybody anything. Although if you want commercial support or if you want the extra features, there’s also a company behind it that can support you, and most people find that GitLab is easy to install, has excellent documentation, and that if there are problems, they get fixed very, very fast. 

Stuart Langridge:       Where’s GitLab going next? What are your plans for the next six months, the next year, the next five years?

Sytse Sijbranij:           We have some ideas, but we don’t really have a roadmap. We used to have a roadmap, but we didn’t like it because it’s always very easy to come up with what we should do this month because everybody’s saying the same thing. So two months ago, we heard from multiple channels that people were fed up with the issue labeling. You had issues, you had labels, you had callers, but they were impossible or hard to customize, so it was really clear that that needed to change so we spent some time on that, and it’s always like that. It’s really easy to decide what to do because somehow there’s a signal coming from multiple sites from the community. We don’t want to run ahead of ourselves and we don’t want to start thinking about what should be done in five months and start promising that. We don’t have a big master plan. For example, a year ago, everyone was complaining that upgrading was hard, so we fixed that with our omnibus installer, and I’m sure that for this release we’ve got it figured out.  We have some ideas for the next release, and I’m sure by the time we do the next release, we’ll have ideas for the release after that. So we’re very roadmap-lite, and if we have any time left, we can always spend a little time on GitLab CI our continuous integration product.

Stuart Langridge:       Is the continuous integration thing a separate project which happens to be from the same company, or are you seeing CI as being tightly integrated into GitLab as a whole?

Sytse Sijbranij:            It’s a separate application, but you can only use it with GitLab. We found that with CI, one of the hurdles is to set everything up to add projects to it and to arrange for all the code cloning and authorizations. With GitLab CI, you can just login with your GitLab credentials, we’ll show you the list of projects on the server, and with one click you can create a CI project out of it and it will clone automatically. There’s very little to configure and we think that it’s unnecessary to convince people to actually start using CI because lots of companies don’t have all the projects in their CI server yet. So it’s bound to GitLab, but it’s a separate application and you can work on it separately. You can host on a different server, etcetera.

Stuart Langridge:       That’s interesting that you talk there about the tight integration. One of the strengths of SaaS solutions in this kind of area is they have a very strong developer API because they’re up in the cloud, and you can’t fiddle with the code yourself. If you look at GitHub, BitBucket, or LaunchPad they have a strong developer API so you can build applications which talk to these things, web hooks for notifications when projects have changed and so on. Does GitLab also have that detailed developer API, so I can build apps and scripts which integrate with GitLab? 

Sytse Sijbranij:            Sure. The APIs that GitLab CI uses are all public and there are CI tools like Macnew CI that even have explicit support for GitLab.

Stuart Langridge:       That makes perfect sense. Where do people go if they want to find out more about GitLab the project and the product?

Sytse Sijbranij:            Just Google GitLab and you’ll probably see our site, and you can read more there. There’s a video on the bottom of the home page that has a few examples, and get into the documentation.

Stuart Langridge:       Excellent, so thank you very much for talking to us. 

Thursday, October 16, 2014

Drupal security fix SA-CORE-2014-005

The Drupal project has just released a new version that fixes a highly critical issue: SA-CORE-2014-005A vulnerability in the Drupal 7 API allows an attacker to send specially crafted requests resulting in arbitrary SQL execution.

We have released BitNami Drupal 7.32 installers, virtual machines and Amazon EC2 images that fix this issue. We are continuing to work on upgrading other Drupal-based applications like CiviCRM and OpenAtrium.

You should patch your Drupal version as soon as possible. You can follow the step by step instructions at this blog post. Basically you will need to ssh to your machine, ssh to the Drupal installation directory and execute drush.

$ cd /opt/bitnami/apps/drupal/htdocs
$ drush up

If everything goes well you should see something similar to the following:

Project drupal was updated successfully. Installed version is now 7.32.
Backups were saved into the directory /home/bitnami/drush-backups/bitnami_drupal7/20141017020023/drupal.       [ok]
No database updates required                    [success]
'all' cache was cleared.                        [success]

Finished performing updates.                         [ok]

In case you were not familiar with Bitnami Drupal, it is a self-contained and easy to use distribution that makes it simple to start developing and deploying Drupal applications.

Wednesday, October 15, 2014

POODLE SSL vulnerability (CVE-2014-3566)

A new vulnerability in the SSL protocol has been published today. Codenamed POODLE, it exploits a flaw in the design of SSL version 3.0. that allows the plaintext of secure connections to be calculated by a network attacker.

Recent Bitnami stacks released in the last 6 months are NOT affected as the default, optimized configuration we use for SSL is not vulnerable. If you are running an older version of a Bitnami stack you may be vulnerable and need to change your configuration. You can learn more in our wiki page for this issue.

Tuesday, October 7, 2014

Bitnami participates in the GitHub Student Developer Pack

The GitHub Student Developer Pack is a collection of developer tools made available to students free of charge. Students enrolled in degree granting programs at all levels will be eligible to sign up for this pack to get free GitHub Micro accounts with 5 private repositories. Besides that, those students will have free access to an entire suite of useful developer tools from selected companies.

Since this program is intended to give students free access to the best developer tools, we're happy to announce that Bitnami is participating; when signing up for the GitHub Student Developer Pack, students will have free access to the Business 3 plan in Bitnami Cloud Hosting for a year - a $588 value! This plan allows users to launch and manage up to 3 cloud servers on the Amazon Cloud (AWS) from Bitnami's library of close to 100 applications and development stacks. This means that students can get a fully-configured and ready to use development environment for LAMP, Django, Ruby on Rails or Node.js, or apps such as Redmine, Jenkins, Drupal, TestLink and more up and running in just a few clicks in the cloud!

To learn more and sign up for the GitHub Student Developer Pack, you can visit this link.

Tuesday, September 30, 2014

Bitnami Bootcamp 2014

Earlier this summer, we announced the first Bitnami Cloud System Administration Bootcamp with the goal of sharing the know-how we have accumulated over the years at Bitnami to train the next generation of system administrators. Over the course of a few weeks, we received over a hundred applications, from which we invited 14 candidates to participate after an extensive selection process and one-on-one interviews.

Over a four week period that just wrapped up last week, we geeked out on Linux and the cloud. The mornings covered mostly theory (albeit with small interactive examples) while the afternoons were devoted to practical exercises. Each student managed their cloud servers using Bitnami Cloud Hosting (of course!) and we tracked and reviewed all assignments using a Bitnami-hosted Phabricator server. We started out learning about system internals, Git, bash scripting, networking and Unix build toolchains. We continued with the ins and outs of automating the deployment of apps written in a variety of languages and frameworks: PHP, Python, Java, Rails and Node.js. We invited guest speakers that covered in-depth a variety of topics, including Tomcat, OpenStack and configuration management tools. We explored the AWS, Azure and Google clouds, learning how to servers in each one of them both through the management console and their APIs/command line tools. We learned how to perform end to end deployment automation using Docker, Ansible and Bitnami tools. We ended with sessions on security and performance and built a project that brought together everything we covered in the course.

After the bootcamp, we invited six of the attendees to join the Bitnami team and they will start this Wednesday! We are excited to have these awesome engineers as part of the team and you can expect a significant increment in the number of apps we provide in the near future thanks to our expanded team.

We plan to repeat the experience early next year, expanding and improving the topics and materials. If this sounds interesting to you, consider applying and joining us next year in Seville!

Cloud System Administration Bootcamp 2015 early sign up

Monday, September 29, 2014

Bitnami Open Source Leaders Interview Series: Dave Page from PostgreSQL

Billing itself as the "world's most advanced open source database", PostgreSQL is bundled with many of our most popular Bitnami apps, including our LAPP stack, Dev Pack, OpenERP/Odoo, Discourse and more. To kick off our new Open Source Leaders podcast series, we interviewed Dave Page, the Director of PostgreSQL Europe, to learn more about the PosgreSQL database and what's to come from the project.

Here are just a few of the topics we covered:

How does the PostgreSQL community work?
Why should one be involved in the community?
What is next for PostgreSQL?  
How does PostgreSQL stack up next to no-SQL leaders?
Does PostgreSQL work on the cloud?
What level of experience do you need run PostgreSQL?

You can launch a PostgreSQL application or stack to the cloud with Bitnami for free, or download any of our free native installers or VMs to run the software locally. You can browse a full list of Bitnami app stacks that contain PostgreSQL on the Bitnami site.

Stuart Langridge:        This is the Bitnami Open Source Leaders series of interviews.  I’m Stuart Lankridge and I’m here talking to Dave Page.  Dave is Director of PostgreSQL Europe, Vice Chair of the PostgreSQL Community of Canada, he’s chief architect of Tools and Installs at EnterpriseDB and he’s a core team member of the PostgreSQL project.  Hey Dave, welcome to the interview.

David Page:                Thank you, Stuart nice to talk to you.

Stuart Langridge:        So what is Postgres? 

David Page:                Postgres is an open source relational database management system. We compete mostly with the likes of Oracle and SQL Server.  It’s a completely open source permissive license.  We have a very large community that are both using Postgres and submitting patches on a regular basis.  So we’re pretty diverse in the things that Postgres supports because we have this large community of people from all sorts of different areas and different industries helping us build the product and add support for all sorts of useful features.

Stuart Langridge:        You talked there about the size and diversity of the community.  Is Postgres targeting a particular use case or does it do everything from small data storage up to huge data stores?

David Page:                Absolutely.  We’ve got people running everything from a 10-megabyte database up to 10 terabyte databases.  There are people who are using it for data warehousing, for LTP, and for storing their unstructured data.  It’s pretty versatile and it’s always been aimed at being a general database that’s useful to everyone. 

Stuart Langridge:        Are there particular areas that you’re interested in targeting at Postgres where you haven’t got there yet, where you’re working on getting into those different kinds of environments and uses?

David Page:                The big area for us at the moment is unstructured data.  People have obviously moved very much towards some of the no-SQL databases for that kind of workload.  Postgres is actually pretty well suited to it as well.  We have fantastic support for json, which is being enhanced with a new jsonb data type that came with our 9/4 release, which has proven to be extremely outperforming in all the benchmarks I’ve seen vs. the no-SQL market leader.  It gives users the opportunity to take those unstructured workloads, move them into their relational database and really amalgamate all their data into one location rather than having to run multiple technologies at once. 

Stuart Langridge:        As you say, there’s quite a big trend toward using no-SQL databases. Does that mean that Postgres is now a drop-in replacement for some of the leading no-SQL things, say Mongo DB or Couch DB?

David Page:                It’s not a drop-in replacement because we don’t support their wire protocols.  But it’s certainly a near drop-in replacement.  I mean, pretty much everything you can do in Mongo you can do in Postgres, certainly with json being a slightly less performant within the older json data type, so I know from interactions with people in the community and people that are customers of ours at work that people are finding this extremely exciting - being able to bring all their data into one place. 

Stuart Langridge:        I mean, obviously, you’ve always been able to take a big block of json and bang it into one database field, which has a big long data text string, but presumably json support in Postgres is more detailed than that.  How does it work?  Can you query against specific fields in a json document? Can you aggregate across json documents?

David Page:                Absolutely, you can query within documents, you can construct documents from relational data if you want.  Obviously you can deconstruct data as well back into relational format, and jsonb comes with some new index operators that allow you to do some really efficient indexing of json. One of the cool new features I really like is you have the ability to do queries on sub-documents.  So you can say, "show me all of the documents that contain this sub-document" and give it another json document and it will look for that within all the existing data very, very quickly. 

Stuart Langridge:        So is that just any exact match or do you get essentially the equivalent to SQL-like query where I can say show me the documents which contain a sub-document containing the following things?

David Page:                The sub-document is an exact match, but of course, then SQL allows you the flexibility to do sub-selects from that of course.  So you can do a like match on the results. 

Stuart Langridge:        What else are you doing to help adapt Postgres to be a competitor to or better than existing no-SQL databases?  One of the things that they tend to claim that they excel at is scaling; being able to spin up multiple separate shards without just partitioning your keys across them.  Is that something that Postgres is already good at?  

David Page:                Well, I have to be honest, most of the noSQL databases are pretty good at that.  The reason they’re pretty good at that is because they ignore many of the asset properties that we have to follow in a relational database, which actually makes it really, really hard to do those kind of things in a more traditional, relational database.  But we’re doing a number of things to address that right now.  First off, we’ve got a project underway called the bi-directional replication project that’s being worked on by a number of the community members.  So this builds on work that’s going to be in 9.4 which basically does logical decoding of the writeahead log. So whenever a change is made in the database, the change gets written to the writeahead log first, which is, in all previous versions of Postgres, it’s a binary format log.  What the logical decoding does is allow you to read that log on the fly without having to use triggers or anything like that, and return it as a set of logical changes to the data.  What that then allows us to do is build on top of that technology, things like some very powerful bi-directional replication tools for example, so we can take those logical change sets and apply them onto the servers, which has allowed us to do filtering of data along the way, and most importantly be very efficient, because it’s something that happens kind of out of process.  It’s not being held up like triggerette-based replicational systems.  It’s not holding up the individual transactions whilst the trigger executes. 

Stuart Langridge:        So you can bring up multiple different shards and then have individual subsets of your data replicate out those shards for failover to just shard the data across different things for scaling or however you might want to do it.

David Page:                Yeah, that’s one potential use case.  At the moment it’s in the fairly early stages.  The infrastructure is in our 9/4 release to actually decode the log files, and everything else is kind of modulate.  You can build code that will read those logs and do whatever you need with it, and obviously there are projects getting under way in the community to do all sorts of things with replication, auditing, and so on and so forth.  One of the other areas where we’re working on addressing the needs of users that are working with unstructured data is our foreign data wrappers, which I’m a big fan of and very interested with.  These allow you to load a driver. For example, you can load a Mongo DB driver, and then you can set up a query or a database at the Mongo DB end, which is represented as actual data as a regular table within Postgres, and what this allows you to do is actually connect to those Mongo DB servers that need to be running Mongo for whatever reason and query them as if they were a structured data source as with any other table.  We have a bunch of foreign data wrappers now for everything from LDAP, Twitter, Mongo, Couch, ODBC, JDBC, and even for the other relational databases like MySQL and Oracle. This brings Postgres to being sort of the central data store or data source, because you can connect to everything else from Postgres.

Stuart Langridge:        So I can essentially use almost any data source I like as basically a back-end for Postgres and just do everything by talking to my Postgres database?

David Page:                Absolutely, there’s a proof of concept FDW written for Twitter, so you can do select star from Twitter stream.  There are other FDWs that know how to reach CSV files. It is very powerful, and gives you ways of doing imports, doing data loads, running reports across multiple data sources, no end of possibilities, really.

Stuart Langridge:        Absolutely, that’s fascinating. You mentioned that these are being written by the community, so you mentioned that you’ve got a reasonable sized community already, but are people generally contributing to the core Postgres itself, building extensions for it or just a large user community helping one another out?  What sorts of communities do you have and how do they interact?

David Page:                All of the above and more, really.  We obviously have a group of people that work on the core Postgres server, and that’s some pretty complex technology, so it’s not something that everybody wants to get involved in.  We have other people that are working on drivers or just spend a little bit of time writing a foreign data wrapper or something, as well as the development community for the server itself and the add-ons to the server. We have people working on things like alternative replication tools, on management tools, drivers for different languages, really it’s a pretty vast ecosystem around the core server.  In addition to those people, we also have people that are users, they come to conferences, they join us there for a day’s worth of talks, and maybe go there in the evening.  We also have regular user groups in lots of cities. It’s pretty diverse.

Stuart Langridge:        If I’m using Postgres in a semi-serious way in my organization, is it worth my while getting involved in the community?  Obviously, if I’ve got tech support questions, I can show up on your forums and ask them, and I suspect people will help, but what do I get from being involved in the Postgres community?

David Page:                If you’re involved, you get the chance to help shape where Postgres is going.  Obviously the more you get involved, the more of a say you have.  If you’re going to contribute code, you get the chance to design that code.  Within the community, people don’t just show up with a patch and we commit it. It’s a very collaborative development process, but by getting involved you really do get to shape how the product will work far more than you will by anything that’s led by a single commercial entity. 

Stuart Langridge:        Let’s talk a little bit about Postgres’s traditional strengths.  Obviously it’s a relational database, you might think of it as two different ends to that.  You’ve got the high-end stuff, so Oracle, MySQL servers as you mentioned, and you’ve got what’s traditionally seen as the lower-end.  MySQL is the obvious example here.  Are there particular areas where Postgres is stronger than the competition, where you would recommend Postgres?  Are there particular areas where they’re not particularly concentrating on where someone would go for an alternative.

David Page:                The Postgres community is, I wouldn’t say averse to the idea, but we’re at the moment not really concentrating on an equivalent to, say, Oracle Rack.  That said, in my experience, most people who have Rack don’t actually need it.  In many cases, they could quite easily replace their system with a couple of Postgres servers, and set up with some failover and appropriate monitoring. Our strength is, and the thing we really do pay a lot of attention to is technical correctness and data correctness, that’s making sure we follow the SQL spec as closely as possible.  As far as I’m aware, Postgres is the most spec-compliant database there is.  Making sure your data’s safe and it’s stored properly.  Data is validated, where you submit a date into a column and whether that’s a valid date.  And that’s always really been the main ethos if you like, of the project, has been correctness comes first.  I think it’s pretty important.

Stuart Langridge:        If you need a motto, that’s not a bad one, I think.  There’s kind of a persistent view that something small, like MySQL is simpler and therefore if you’re only putting together a small project, it’s a better choice.  If you want something larger, more complicated, something where you need to do a lot more with your data, then Postgres is better at the cost of being more difficult to use.  But is that view justified?  There are an awful lot of web hosting environments out there, which just give you a control panel and PHP and MySQL, it’s almost always MySQL.  But is that just a historical artifact or are you working on making Postgres more appealing to people in that kind of small environment?  

David Page:                Yeah, it’s an interesting question, and something that we’ve thought about long and hard over the years and worked on over many years.  Years ago, certainly MySQL was much easier to set up than use.  Nowadays, the situation I think we’re in is Postgres is easier to set up and use, but Postgres has a lot more advanced features than MySQL, and those are where you’re going to start hitting capacity and flexibility. By definition, although those advanced features are more complex, we’ve done a lot of work over the years to try and improve things for new users.  Seven, eight years ago, MySQL had nice GUI installers and when you came to use Postgres, it was, well how do I compile this from source and install it?  One of my colleagues in the community and I originally worked on some installers when we first built that and decided at that point let’s try and make things as simple as possible.  We’ll make it really easy for people to get up and running on our new Windows port.  The company I worked for, Enterprise DB took things one step further when we said all right, we’re going to redesign the installers completely.  We recognize some of the limitations in the original ones, and nowadays we have a set of installers for Windows, for Linux, for Mac, that if people choose to use those, they get a very simple experience of just sort of four or five clicks and it’s installed and running.  So I think these days, the argument that MySQL is easier is no longer true. 

Stuart Langridge:        You mentioned there about having different installers for different platforms.  That’s an interesting point.  Let’s talk a little about how one actually gets Postgres.  Do you find that most people are deploying on Linux hosts?  Are you expecting them to use the Postgres from their choice of Linux operating system or do you encourage people to install from

David Page:                Another good question. Most people will use the installers for testing things out, for the typing, or working on their laptops. When they deploy, what they will tend to do is move either to one of the RPM distributions or the Debian or Ubuntu.  Now, what we found is that, with Redhat for example, Redhat are on a five-year release cycle for REL.  So they’ll lock onto one version of Postgres, one major version at the beginning of that release cycle, when they’re just getting ready to go to beta, and then they’ll stick with that version, so I think it’s Postgres 8.4 is in REL 6 for example, and I think 9.1 is in REL 7.  That really doesn’t help a lot of users who want some of the newer features in Postgres.  Postgres is advancing very quickly.  We have major releases every year and we add new features.  We never add new features to minor releases.  People want to get those new features.  People are going to want to get jsonb support in PostgreSQL 9.4 for example.  So one of the things that the community does is maintain both the young repository and an APT repository on, where users if they prefer not to use the vendor supply copies of Postgres, they can come to us, they can get whatever version they want for whatever version of REL, Fedora, or Ubuntu or Debian they want. 

Stuart Langridge:        So you’re working to make sure that, obviously you’ve got a certain amount of combulatory explosion, and you’ve got various different versions of REL, of Ubuntu server, of Debian, and various different versions of Postgres.  So what do you do about supporting those different versions, or are all versions of Postgres supported on all releases of Ubuntu server?

David Page:                No, we tend to phase out the older versions of Ubuntu as they get replaced by newer versions.  We won’t stop creating the new builds of a particular version of the Postgres server for it, we just might deprecate one for future versions of the Postgres server.  In the case of REL it tends to change slowly, it’s not really a major issue.  It’s more of a problem with Fedora of course, where we’re in the same sort of boat as with Ubuntu, but we do have a pretty wide selection.  I think we normally support somewhere over the last four or five versions of Fedora and Ubuntu typically with any given release of Postgres.

Stuart Langridge:        That makes sense.  I know you’ve been working with Bitnami's BitRock packaging technology as well to put together Postgres.  How’s that been?

David Page:                Fantastic, they’re a great bunch of guys.  I love working with them to be honest.  We use their installers for the Postgres installers that we build.  We also use them for all of our products within enterprise DB.  The technology is great because I can write and install it once and it runs on a ton of different platforms, you know, not just every version of Linux, but they’ll run on HPUX on Highex, and whatever I need.  There’s not much more I can say.  It’s great technology and they’re a really nice bunch of guys.

Stuart Langridge:        So talking about doing new releases and so on, what’s your release strategy, what’s the cadence?  How often do you put out new releases?  What’s the policy on when a new release comes out, what it should have in it, that sort of thing.

David Page:                We put out a new major release of Postgres roughly every 12 months give or take a few weeks.  In those new major releases, we’ll have new features.  We may have database format changes, so you’re required to go through an upgrade process to get from one major release to another.  Each of those major releases will have what we call point releases or update releases that are pushed out when there’s a need.  So if we find a particularly serious bug or we collect a handful of minor bugs, we’ll get to a point where we say, it’s time to release that.  Those releases never, ever change data formats, APIs or add features.  And the golden rule as far as Postgres is concerned, we always consider it safer to upgrade a minor release than not to upgrade, and I think that’s really important.  We sometimes find people who have sort of stayed on the dot one release for ten years.  You get heart palpitations trying to give them support. How many possible bugs are in that version you’re running that we fixed long ago.  Once a major release is out, from that date we consider it fully supported in the community for five years.  I know some of the vendors that work with the community, there are various companies that provide Postgres support, most of whom we know well and contribute to the project.  Some of those we’ll support for longer periods, seven or even ten years, but obviously that’s paid-for support. 

Stuart Langridge:        Yeah. And a major release in this context is 9.4 as an example, so not 9. 

David Page:                9.0 was a major release, but if either the first or the first and second or just the second digit change, that’s a new major release.  The third digit is always a bug fix release, and with the installers, we also have a build number on the end, so it’ll be sort of 9.3.4-1. 

Stuart Langridge:        So I’ve got Postgres from my choice of place, either from the app repository or from using one of your installers, one of the big things that’s happening now is people deploying stuff into the cloud, so instead of just building things on your own data center, you’re deploying to EC2 or Rackspace or any one of HP clouds or any one of a hundred different cloud providers.  Are you seeing a lot of people doing that with Postgres now and what kind of work are you doing to make Postgres good in that sort of environment?

David Page:                Absolutely, we’re seeing people doing it.  To be honest, there’s not a huge amount of work going on in the community on that front, because Postgres just works in the cloud.  It doesn’t care whether they’re on virtualized hardware or real hardware or a virtualized machine that you know the physical host for is in the room next door - it just doesn’t care.  Within EnterpriseDB, we have a product called Postgres Plus Cloud Database, which does take advantage of some of the cloud features and actually is basically a management system that will let you do one click deployments of clusters of servers with replication between them.  So you have a read-write master and then one or more read-only slaves with load balancing and what have you over the top of it.  We add cloud-like features to that, such as auto-scaling so we can automatically increase the amount of storage you have.  We can spin up additional read replicas if you need additional capacity, that sort of thing.

Stuart Langridge:        Yeah, and I believe you’ve been working with the Open Stack project as well.

David Page:                We’re using it a lot internally within EDB for testing the Postgres installers and for testing the proprietary version of Postgres as well.  But also we’re in the process of porting Postgres Plus Cloud Database across so that it will be available as a service that some of our larger customers can deploy in their cloud infrastructures. 

Stuart Langridge:        We talked a little bit earlier about Postgres being easier than we think it is to set up and so on and so forth.  What about the level of experience you’d expect someone to have to run Postgres in production, if you bring it into your organization or install it in the cloud for your organization?  What kind of level of experience do you think people need with Postgres to be a good assessment, to be a good DBA?  Is it something that’s possible to do as a relative novice or do you think that you need strong sys admin skills to get the best out of it?

David Page:                It really depends what you’re doing with it.  I mean, if you’re trying to run a 10 terabyte database and you’ve got 100 users connecting to it constantly and running data warehouse-type queries, then yeah you’re going to need someone who is familiar with tuning Postgres and familiar with tuning the operating system and the hardware to get it to run well.  However, most databases aren’t that big, and we hear about web scales so much these days, and it always annoys me because so many of the systems I see and come across on a daily basis are nothing like that big. I have customers running 10 megabyte databases.  They’re increasing but they’re very, very tiny.  In reality, I think the majority of the databases I see are probably just in the range of a few gigabytes, and on modern hardware, you can install Postgres and run a typical app, say a helpdesk or an asset management system that’s used in a company by a couple hundred people ,with virtually no skills at all. 

Stuart Langridge:        That’s ideal for someone like me if nothing else, but it’s useful to hear that.  So you’ve already spoken a little bit about jsonb and moves towards the no-SQL environment and so on.  Where else is Postgres going next?  What are your plans for the next six months, next year, long term plans? 

David Page:                It’s difficult to say, one of the interesting things about working in an open source community is that there doesn’t tend to be a long-term road map because all of your developers are volunteers, so they tend to scratch whatever itch they’ve got at the time as a general rule. We’ll take code for new features as long as they make sense and are reasonable features and properly written, etcetera, we’ll generally add them to Postgres.  I think the big trend at the moment really is more towards clustering, the bi-directional replication project for example.  People have been looking and working on using the foreign data wrappers for sharding.  I think in general we’re going to be heading more in that sort of direction.  That said, there’s always more SQL features to add.  There’s always better ways to write the optimizer, there’s projects like Parallels that one of my colleagues in EDB are working on. Actually two of my colleagues are adding support for Parallels within the server, so that it can have multiple processes running concurrently to handle the same query, that sort of thing.  There’s plenty of work for us to do in lots of different areas.  As other technologies around us change, there are going to be more things that people want, new ideas that people come up with, so I don’t think that we’re ever going to say that Postgres is done, it’s finished. 

Stuart Langridge:        At no point do you get to "down tools", I'm afraid.

David Page:                Right, until I retire.

Stuart Langridge:        And one final, totally critical question, we’ve been discussing the project and calling it “Postgres” but it’s actually called PostgreSQL or is it called post-gres-sequel or –

David Page:                No, it’s not post-gres-sequel, it’s not “postgree” as we often hear, the official project name is PostgreSQL, where postgres is one word obviously and QL on the end.  However, the word postgres has been officially accepted by the core team as a short name for instead.  So either Postgres or PostgreSQL. 

Stuart Langridge:        Hooray, I shall continue calling it Postgres.  Excellent.  Thank you very much for talking to us, Dave, and where would people go to find out more about Postgres? 

David Page:      

Stuart Langridge:        Excellent, thank you very much indeed.  Dave Page, of the Postgres core team. 

[End Audio]