Wednesday, May 1, 2013

Moving to C++

GCC 4.8 was recently released. This is the first GCC release that is written in C++ instead of C. Which got me thinking ...

Would this make sense for PostgreSQL?

I think it's worth a closer look.

Much of GCC's job isn't actually that much different from PostgreSQL. It parses language input, optimizes it, and produces some output. It doesn't have a storage layer, it just produces code that someone else runs. Also note that Clang and LLVM are written in C++. I think it would be fair to say that these folks are pretty well informed about selecting a programming language for their job.

It has become apparent to me that C is approaching a dead end. Microsoft isn't updating their compiler to C99, advising people to move to C++ instead. So as long as PostgreSQL (or any other project, for that matter) wants to support that compiler, they will be stuck on C89 forever. That's a long time. We have been carefully introducing the odd post-C89 feature, guarded by configure checks and #ifdefs, but that will either come to an end, or the range of compilers that actually get the full benefit of the code will become narrower and narrower.

C++ on the other hand is still a vibrant language. New standards come out and get adopted by compiler writers. You know how some people require Java 7 or Python 2.7 or Ruby 1.9 for their code? You wish you could have that sort of problem for your C code! With C++ you reasonably might.

I'm also sensing that at this point there are more C++ programmers than C programmers in the world. So using C++ might help grow the project better. (Under the same theory that supporting Windows natively would attract hordes of Windows programmers to the project, which probably did not happen.)

Moving to C++ wouldn't mean that you'd have to rewrite all your code as classes or that you'd have to enter template hell. You could initially consider a C++ compiler a pickier C compiler, and introduce new language features one by one, as you had done before.

Most things that C++ is picky about are things that a C programmer might appreciate anyway. For example, it refuses implicit conversions between void pointers and other pointers, or intermixing different enums. Actually, if you review various design discussions about the behavior of SQL-level types, functions, and type casts in PostgreSQL, PostgreSQL users and developers generally lean on the side of a strict type system. C++ appears to be much more in line with that thinking.

There are also a number of obvious areas where having the richer language and the richer standard library of C++ would simplify coding, reduce repetition, and avoid bugs: memory and string handling; container types such as lists and hash tables; fewer macros necessary; the node management in the backend screams class hierarchy; things like xlog numbers could be types with operators; careful use of function overloading could simplify some complicated internal APIs. There are more. Everyone probably has their own pet peeve here.

I was looking for evidence of this C++ conversion in the GCC source code, and it's not straightforward to find. As a random example, consider gimple.c. It looks like a normal C source file at first glance. It is named .c after all. But it actually uses C++ features (exercise for the reader to find them), and the build process compiles it using a C++ compiler.

LWN has an article about how GCC moved to C++.

Thoughts?

28 comments:

  1. Better start from scratch - as multithread, multicore database.

    ReplyDelete
    Replies
    1. You're right! Now that we have a good database engine that's widely praised and gaining mindshare rapidly, it's time to scrap all that working code and start over with radical new ideas.

      Delete
    2. Postgres is good database, but it is designed 25 years ago, lot of internal components are aged. I see a useless to pay time and energy for port to C++ without significant refactoring and redesign. Lot of internal patterns - list, nodes, executor, error handling should be implemented different in C++ probably. But deep changes are related to deep bugs and issues - anybody knows a issue related to KDE4 or GNOME3.

      Delete
    3. I'll see you in 25 years when your database is done

      ...and 25 years out of date.

      Delete
  2. Being neither a C++ programmer or C one, I find C much easier to read and has certainly given me a lot less hassle compiling.

    The biggest problems I've had is integrating C++ code dependencies (e.g. for PostGIS GEOS and even PL/V8 the V8 engine). That's where all my crashing happens. I'm not sure if your proposal makes things more difficult or worse. I suspect worse just because I get (possibly misguided) feeling that C++ is more sensitive to things like which GCC version each part is compiled with and ABI compatibility (which I still have no idea what that means)

    I guess I'm just saying that PostgreSQL and other libraries -- e.g. PostGIS have their own dependencies of which much is not under their control and not even guaranteed to be compiled by the same group. So a thorough review of those is necessary before jumping unto a hot plate. Even I as a windows user/developer take anything that Microsoft is doing or saying with a grain of salt. Simply because I know what they say is often how they would like things to be and they will be driven by inertia whatever that direction the inertia is swinging. I'd rather predict the direction of wind than pay attention to what they are doing.

    ReplyDelete
  3. Peter, just wondering is possible that google go would be a good language choice today to create a new database programme?

    http://talks.golang.org/2012/splash.article

    ReplyDelete
    Replies
    1. And use a language that does not allow you to disable the GC instead? Not to mention lack of generics. I don't think so. :)

      Delete
    2. I think so, yes. But of course Go is fairly new, so the jury is still out, so to speak.

      Delete
  4. Imo postgresql could use a lot of c++. But I'm afraid it would require pretty much a rewrite.
    And I don't think some of the top developers be happy to even consider it.
    Don't get me wrong, postgresql code is very nice - but as far as code goes, long functions, global variables, and all sorts of other ugly concepts that could use rewrite in c++.

    ReplyDelete
    Replies
    1. I'm interested in researching a gradual change, like GCC has done. A complete rewrite is neither realistic nor useful.

      Delete
  5. Linus Torvalds puts it well: "Quite frankly, even if the choice of C were to do *nothing* but keep the C++ programmers out, that in itself would be a huge reason to use C."

    http://harmful.cat-v.org/software/c++/linus

    ReplyDelete
    Replies
    1. If Linus says it, it must be true!

      Delete
    2. I suspect Linus would either grin or denounce you as a fool - depending on circumstances!

      Delete
  6. Using a C++ compiler as a better (supported) C compiler should not bring too many problems en would ensure continued support on many platforms. One concern could be with embedded platforms, but perhaps even those have better C++ support than they did in the past.

    And taking the pragmatic route like GCC has done (switch over, but don't rewrite the whole system to make it fit, if ever) should not change that much. And it gives you the option to start using some features in the future if they are deemed beneficial by the core developers.

    ReplyDelete
  7. Is C++ really the future of C?

    I'd rather hoped there would be a better way of introducing some higher-level primitives without introducing all of the cr*p that C++ brings with it.

    Maybe I'm old fashioned but I think it's a very bad decision of M$ not to support newer standards of a very proven technology.

    ReplyDelete
    Replies
    1. However, writing Microsoft as m$ doesn't make you sound old fashioned, it makes you sound like you're 15. Please stop writing like a fan boy. You bring no argument to the conversation.

      Delete
  8. The GCC migration is a I think the RIGHT way to do it for a mature project like PostgreSQL. Don't let the scope turn into a situation where you ALSO need to rewrite everything, just start, to the degree reasonable, taking advantage of the features when available.

    ReplyDelete
  9. How would you deal with longjmp/setjmp not working in C++ land for RAII/non-POD objects?

    ReplyDelete
  10. Do not use Microsoft :)

    ReplyDelete
  11. Someone did implement a postgres inspired RDBMS in c++. It is called electrondb. I was an advisor to them for a couple of years. They are kind of in stasis awaiting more VC funding. Trying to convince them to go open source. But on a technical note, couple of guys wrote about 3 quarters of a million lines of code in about 4 to 5 years. As a C guy (ex sybase architect) I found reviewing this code a bit counter intuitive but it is not really that difficult once you get used to it. Template syntax is a pain though. Not sure about performance. We were starting to do tpc runs when things went into a hiatus.

    So cut a long story short it can be done and the end result is acceptable. But how maintainable the code is, how portable across is and where the compiler and std lib trip you up can be determined only if this code sees production. But drizzle should give sufficient pointers on this since it is open though not postgres inspired.




    On a related note i develop linux kernel code and l4 micro kernel. One is in c and the other is in c++. i need the c++ for some our security research design paradigms, so have no choice there. Toolchain in c++ is a pain. But microkernels are less tnan 20k lines of code, so we can live with it. Not sure if the code can take contributions at the level of linux. Unless competence of the average ( not the gurus !) driver writer improves, c++ is best avoided.

    ReplyDelete
  12. C++ does not suit every application. for example the kernel cannot do without C and some assembly code is inevitable too. so C++ is NOT a stand-in replacement for C. it's just another programming language. whether it helps the project in the long run depends on an analysis of what C++ features are needed vs what C features will be lost, and if it's all worth it.

    here is some good information about why C++ by Herb Sutter for those who might be interested:
    http://channel9.msdn.com/posts/C-and-Beyond-2011-Herb-Sutter-Why-C

    ReplyDelete
    Replies
    1. Everything c can do, c++ can do it too.With c++, it is easier to write type safe codes in c++, c++ supports cleaner code in several significant cases. It never requires "uglier" code like those ad hoc macro in linux.

      If you look at the way the Linux kernel uses macros combined with GCC extensions like typeof(x), it is obvious that they are actually writing templates. And many of their struct definitions reproduce inheritance and virtual method calls.

      You could look at it as writing C++ code disguised as C.

      Delete
  13. I believe starting to write new part of Postgresql in C++ would be a smart move, for start I would make the entire Postgresql code base compiling with a C++ compiler.

    ReplyDelete
  14. In 2006-2008, at Dataupia, Postgres was already ported to C++ AND converted from multi-process-based to multi-threaded single-process-based. It was close-source (as part of high performance parallel database grid appliance). I left Dataupia in 2008 and I don't know what happened to that source base in the last 3-4 years. Speaking of that conversion. It was HUGE effort, and the people involved in this had previously invented bitmap indexes and wrote major commercial database engines. Knowing the amount of work needed for this, I would strongly advise against this initiative.

    ReplyDelete
    Replies
    1. Well, changing PostgreSQL from multiprocess to multithread would likely be a 3-4 year effort as well, but no one is proposing that. I am in fact arguing for not changing the architecture at all, just upgrading the language slightly.

      Delete
  15. I support to use c++ to compile c-based postgresql code. Do not need to change the architecture or add in any class or template features. I never read the postgresql source code but it is very possible to link a c code to c++ object, intermix c and c++ in the same program. I would suggest
    a) Try to compile the code base, one by one, to c++. Test and fix along the way.
    b) Consider to use c++ features if it bring easy usage and performance.

    My experience in previous company
    a) they did that and it worked for c programs.
    b) programs that use c++ template takes very long to compile.
    c) program that use deep and multiple inheritance has memory size and performance problem. we could not debug the program line by line easily (9 years ago as memory was limited and expensive on sun system). The was worse if array of c++ objects was created as each object need to call its constructor and parent constructors.

    ReplyDelete
  16. I've done a lot of work in C++ over the years, including porting a pretty large program from Fortran 77 to C++. And as a newbie to the PG code base, there are a few things that strike me as being cleaner in C++. For example, if linked-list elements could inherit from {d|s}list_node, rather than embed it as a member, then the list-handling code could be functions rather than macros if one wanted. (I'm not a fan of macros except where necessary.)

    But I think there's a considerable risk of a slippery slope, which I've run into in the past. So many of C++'s features look like perfect solutions to certain problems (lambda functions, exceptions, templates of all kinds, inheritance, etc.) But it's so easy to *start* using those sensibly, and then wake up two months later in a ditch with a hangover, a new tattoo, an incomprehensible mess of "idiomatic" C++ code, and no idea how you got there.

    If PG could be ported to a small, well-disciplined subset of C++ that made life better rather than worse, it could be a win. But coming to a good consensus about what subset that is, and sticking with it, might be a bit harder.

    ReplyDelete