July 04, 2009

Sam Stephens

Using Python to Characterize Semicondutors

For serveral years I have been working on characterizing semicondcutors using equipment such as Oscilloscopes, Spectrum Analyzers, RF Sources and whole assortment of measurement equipment. I quickly noticed definate patterns assoicated with the test that I was performing making them ideal for automation. In addition, many of the standards such as PCIe, SRIO, Xaui and SGMII include very time intensive requirements all needing to be verifed through test. For example a simple receive tolerance test can take litterally days to complete depending on the required confidence level and number of features that a particular transmiter, receiver pair offer. In all of my testing I have found that Python is can be used to make this test simpler. In the following blog I hope to share some of the code and tricks that I have used to do this type of testing.

by Sam's Place (noreply@blogger.com) at July 04, 2009 03:02 AM

July 02, 2009

Matthieu Brucher

Setting up a Redmine application with Apache: which module to use?

In March, I’ve set up a Redmine application with the Ruby webserver Webrick. Since then, I’ve migrated to Apache, and then the question was: Which Ruby bridge module to use? It’s not that the choice is large, you have mod_fastcgi, mod_fcgid and mod_rails a.k.a. Passenger. I’ve tried the three of them, and only one was a success.

As for the last post about Redmine, I’ve compiled everything (Apache included) in a custom location and I start the server from there (without root rights).

mod_fastcgi

This is an old module, but it should be easy to use.

After setting up the configuration file to get the public/dispatch.fcgi file, Ive started the server (did I say that Ruby does a wonderful job by providing every wrapper file needed for any webserver? It provides cgi and fcgi skeletton files, and of course everything needed for Webrick). Unfortunately, it didn’t work, it misses the rubygem module, although it is installed as I’ve used it to install Ruby on Rails!

So I left mod_fastcgi where it was, in the dust.

mod_fcgid

This module is based on Fast CGI as well, but is more recent than mod_fastcgi. It uses Unix socks for the communications between Apache and the back-end, and in my configuration, the server didn’t seem able to create socks (I didn’t find a reason on the Internet, but I have to say I didn’t look far, and besides Google has not much to say about this issue. And I still have Passenger). So much for this module.

Passenger

This module is supposed to interact directly with the ruby files, and in fact it does. No need to have a dispatch.fcgi file, it calls the server as Webrick does.

I’ve installed rake with a gem, but Passenger setup didn’t find it, I had to add a symbolic link between lib/ruby/gems/1.8/bin/rake and bin/rake. I’ve installed fastthread as well, and I was good to go.

In the configuration file, I load the appropriate modules:

LoadModule passenger_module /web/src/passenger-2.2.2/ext/apache2/mod_passenger.so
PassengerRoot /web/src/passenger-2.2.2
PassengerRuby /web/bin/ruby

Then I can configure the Virtual Host:

NameVirtualHost *:3000
<VirtualHost *:3000>
 
DocumentRoot /src_custom/redmine-0.8.3/public
 
<Directory /src_custom/redmine-0.8.3/public>
AllowOverride None
Order allow,deny
allow from all
Options Indexes FollowSymLinks MultiViews
</directory>
 
ErrorLog /web/logs/error.log
LogLevel warn
CustomLog /web/logs/access.log combined
ServerSignature On
</VirtualHost>

Conclusion

With Passenger, it was really easy to have a working configuration. The only thing I’m still missing is the ability to set the RAILS_ENV variable to select a different environment than the default (production).


by Matt at July 02, 2009 08:19 AM

July 01, 2009

Gaël Varoquaux

PhD proposal on material-science modeling with Python

Philippe Baucour from the ‘Unniversité de Franche Comté’ sent me an email saying that he was looking for the rare PhD candidate that would be able to do numerical modeling and material science on top of high-quality Python coding. I can sympathise with this quest: it is very hard to find someone who codes well, and if you want on top of that him to be able to do numerical modeling!

If you are not afraid of French, his PhD proposal is below. Contact him for more information. Please don’t contact me, I am drowning under (very interesting) e-mail.

Utilisation du calcul parallèle pour la modélisation fractale d’un stack de type PEMFC.

Proposition d’allocation de thèse
Thème 1.g : Modélisation, simulation et calcul haute performance
Thème 2.a : Energie, procédés, impacts environnementaux, stockage de l’énergie

Responsables au sein du département ENISYS-FEMTO-ST équipe Modélisation
D. Hissel, M.C. Péra, R. Glises, Ph. Baucour.

Les phénomènes qui prennent place au sein d’un stack de type PEMFC sont de nature multi-physiques et multi-échelles. Ainsi le comportement d’un stack complet ne peut être appréhendé dans sa globalité que s’il on intègre des domaines tout aussi différents que :

  • Les phénomènes électriques et électrochimiques.
  • La mécanique des fluides,
  • Les phénomènes de transferts de chaleur et de matières,

L’ensemble de ces disciplines interagissent à des niveaux d’échelles complètement différents : du dépôt catalytique (i.e. ~um) au stack (i.e. ~m) soit un facteur d’échelle d’environ 106. De plus, les constantes de temps des différents phénomènes sont elles aussi très différentes et rajoutent à la complexité du problème.

Il y a énormément d’études portant sur la modélisation des piles à combustibles mais les difficultés énoncées ci-dessus amènent à faire des restrictions soit sur le domaine d’étude (une cellule), la géométrie (1D ou 2D rarement 3D) ou la représentation des phénomènes (modélisation système). De plus, la puissance de calcul nécessaire pour ce type de problème fortement couplé et non-linéaire n’est pas facilement accessible.
Le travail envisagé consiste à développer une modélisation 3D complète d’un stack à toutes les échelles à la fois de temps et d’espace. L’approche envisagée consiste à utiliser un modèle fractal qui puisse se partitionner et s’adapter à l’ensemble des échelles (temps et espaces) présentes dans un stack. La conception d’un code modulaire permettrait à terme de tester certaines hypothèses sur le fonctionnement des PEMFC. On peut citer par exemple :

  • La gestion de l’eau et de l’humidification des gaz.
  • Le démarrage à froid.
  • Le fonctionnement en mode dégradé.
  • Le design des canaux d’alimentation en gaz.
  • Étude de la durabilité et de la fiabilité par un cyclage numérique.

Le laboratoire (ENISYS) dispose depuis peu d’un cluster de calcul qui permet d’envisager un modèle complet. Il est composé de 8 noeuds de calcul comportant un total de 64 processeurs pour 64 Go de mémoire et un espace disque de 1 To.
L’objectif de la thèse serait de développer un code parallèle qui permettrait de distribuer sur les 64 coeurs un modèle complet. Ce modèle peut s’envisager comme l’agrégation de modèles à différentes échelles :

  • Modèle d’Assemblage Membrane Electrode
  • Modèle d’écoulement non-conservatif dans un canal (déjà développé)
  • Modèle de comportement thermique des plaques bipolaires
  • Modèle de comportement électrique

Ces modèles relativement simples individuellement seront regroupés afin de former un modèle complet. La difficulté consiste à agréger les différents calculs à la fois en terme de temps et d’espaces, on parle alors de spatial computing ou de parallel computing si l’on distribue un problème complexe sur plusieurs processeurs. Dans le cas de la modélisation d’un stack PEMFC, le spatial computing est envisageable pour les différents domaines d’espaces mais il faudra recourir au parallel computing pour combiner l’ensemble des modèles et s’assurer de la convergence.
Cahier des charges de l’étude :
•   Définition du stack étudié en se calquant sur les données expérimentales disponibles.
•   Développement des codes de calcul en s’assurant de la compatibilité avec un fonctionnement dans un cluster.
•   Développement d’un modèle maître faisant la collecte des différents modèles.
•   Définition du partitionnement spatial et temporel.
•   Validation sur des données expérimentales disponibles au laboratoire.
Matériel et logiciel envisagé :

  • Utilisation du cluster sur une base de 32 processeurs en utilisation récurrente et 64 processeurs en utilisation intensive
  • Programmation en Python des codes individuels et du code maître en utilisant au mieux les bibliothèques de calcul scientifique (Scipy, Numpy, FiPy, PyPar). L’utilisation d’un code propriétaire entraînerait un surcoût exorbitant en termes de licences (64 licences Matlab par exemple !)
  • La parallélisation se fera par l’utilisation du MPI (Message Passing Interface) implémenté en Python.
  • L’utilisation d’une solution de parallélisation est envisageable à travers l’utilisation de Ipython.

Contact:

                                                                 Dpt-ENISYS
                                                        Energie, Ingénierie des Systèmes
                                                                   multiphysiques
                                                                        Daniel Hissel
                                                        Techn­Hom, 90010 Belfort Cedex, FRANCE
                                                                 Phone : 33 (0) 3 84 58 36 21
                                                                  Fax : 33 (0) 3 84 22 27 22
                                                              @ : danieL.hissel@univ-fcomte.fr
   Franche-Comté Electronique Mécanique Thermique et Optique - Sciences et Technologies
                                                   UMR CNRS 6174
Contact : Monsieur Daniel Hissel
Chef d'équipe Modélisation

by gael at July 01, 2009 03:02 PM

Mumbles on object-oriented designs: framework objects and data containers

I recently sent on a mailing list a few thoughts object-oriented design, so I might as well also be ridiculous on my blog.

I find that in object oriented design, there are two kinds of objects:

  • A first kind is the object encoding logics. This is an object for which clever and complex design will hold together the logics of a state-full application. It can often be part of a forest of objects that are linked together via design patterns. The interfaces of these objects are driven by their active role in the application. These objects are prominently present in interactive application and interactive application. They are mostly particular to an application or a framework, and are mostly implementation-defined.
  • The second type of object is a data container. It strives to expose a data model that can be of use in various situations, as it is the link between different parts of the code that do not talk to each other apart through data. It is responsible for loose coupling (something that is very important to achieve a maintainable code base) by having a light and shallow interface. It must be interfaced-designed, rather than implementation-designed. One should very easily get a grasp, an almost physical feeling, for the object by simple interaction with it. I have what I call the ‘explaining test’ for these objects: can I explain fully and completely to somebody what the object does, and any possible caveat, without being sidetracked into special discussions? If not, back to the drawing board: the object will not gain acceptance. In my experience, only the objects of the second kind can easily be shared between different projects.

by gael at July 01, 2009 04:13 AM

Fernando Perez

Scipy advanced tutorials results

We recently conducted a poll on Doodle, soliciting feedback on the preferred topics for the advanced track, which is meant to contain 2 days with 8 2-hour sessions focusing on one specific topic at a time. The table below shows the complete results, which I've only sorted for convenient viewing and anonymized (the raw Doodle output contains the names given by each person voting). If anyone would like the raw spreadsheet, just drop me a line.

The score was computed as #yes-#no (i.e., yes=+1, neutral=0, no=-1), from a total of 30 responses, and the results are in the table below, ranked from highest to lowest score. In my personal opinion, all the topics offered would have made for very good and interesting tutorials, but the point of asking for feedback is obviously to follow it to some degree, which we will now do. I think it's worth noting --though not particularly surprising-- that the ranking roughly follows the generality of the tools: matplotlib and numpy are at the top, with finite elements and graph theory at the bottom. While I personally use NetworkX and love it, it's a specialized tool that for many probably offers no compelling reason to learn it, while pretty much every single numerical python user needs numpy and matplotlib.

We are now in the process of contacting possible speakers for the top topics, and will communicate on the mailing list a final list of topics once we have confirmed speakers for all.

Note: the html formatting of this table is hideous and for some odd reason it drops to the bottom of the page, so you need to scroll way down to the bottom of this page to see the results table. Sorry. I generated it from OpenOffice and it looks fine in Firefox, but it renders horrible here. If anyone can send me a note on how to fix it (such that I can copy/paste the corrected html), I'll be happy to do so.











































































































































































Yes Neutral No Score Rank
Advanced topics in matplotlib use 18 10 2 16 1
Advanced numpy 18 10 2 16 2
Designing scientific interfaces with Traits 15 11 4 11 3
Mayavi/TVTK 13 11 6 7 4
Cython 14 8 8 6 5
Symbolic computing with sympy 15 6 9 6 6
Statistics with Scipy 9 15 6 3 7
Using GPUs with PyCUDA 13 7 10 3 8
Testing strategies for scientific codes 11 11 8 3 9
Parallel computing in Python and mpi4py 12 8 10 2 10
Sparse Linear Algebra with Scipy 9 12 9 0 11
Structured and record arrays in numpy 8 14 8 0 12
Design patterns for efficient iterator-based scientific codes 9 7 14 -5 13
Sage 8 6 16 -8 14
The TimeSeries scikit 4 13 13 -9 15
Hermes: high order Finite Element Methods 6 9 15 -9 16
Graph theory with NetworkX 5 9 16 -11 17

by Fernando (noreply@blogger.com) at July 01, 2009 01:37 AM

June 30, 2009

Matthieu Brucher

Book review: C++ Coding Standards: 101 Rules, Guidelines, and Best Practices

There is no official C++ standard, unlike several languages (Java, Python, …) where there are referentials for code and design style, good practices, … It didn’t exist until this book where two world-renowned C++ authors set the basis for your every day development.

101 coding standards, numbered from 0 to 100 (an echo to the fact that C++ starts counting from 0), this is the content of the book.

Content and opinions

The standards are split in several groups, from policy to type safetiness. Each time, the coding standard is stated, with a short summary and then a discussion. There can be an example if needed, and some references. The standard is always simple enough to follow, and the explanation is clear yet complete.

The handled issues are very vast, oriented towards common pitfalls. Use inheritance when needed, use collaboration elsewhere, do not inherit from a class that isn’t made for inheritance, … When you are used to these pitfalls (because a lot of C++ gurus talk about them in their forum posts, mails or blogs), you may sometimes forget them and write code that is not optimal (in several ways, performance or maintenability). The book is in that regard a good way of having the good practices classified by topics and easilly accessed: you don’t have to check or search on the Internet. Finally if someone has a question on why you used a specific coding standard, you can give a full explanation and a context (and spread the good pratices).

Conclusion

C++ Coding Standards are sometimes more a question of style than of language, but they are part of the general pieces of advices one should follow. C++ is a language that permits a lot of things, perhaps too much, and this set of rules makes it possible to write readable, efficient, robust code.

C++ Coding Standards: 101 Rules, Guidelines, and Best Practices (C++ In-Depth Series) (Paperback)
by Herb Sutter, Andrei Alexandrescu
ISBN: 0321113586

Price: USD 37.26
52 used & new available from USD 22.00

| 4.5 | 27


by Matt at June 30, 2009 08:01 AM

June 29, 2009

Enthought

Revamped Plot Toolbar

Last October I added a toolbar for Chaco plots. It was functional, but it wasn’t very pretty. I decided to rewrite it from scratch, with emphasis on improving the appearance and improving the auto-hide feature.

The new toolbar also employs a new feature to Enable: gradients! Gradient support is still a work in progress, but improving daily.

PlotToolbar example screenshot

by Bryce Hendrix at June 29, 2009 07:46 PM

June 28, 2009

David Cournapeau

cournape


The numpy 1.3.0 installer for windows 64 does not work very well. On some configurations, it does not even import without crashing. The crashes are mostly likely due to some bad interactions between the 64 bits mingw compilers and python (built with Visual Studio 2008). Although I know it is working, I had no interest in building numpy with MS compiler, because gfortran does not work with VS 2008. There are some incompatibilities because the fortran runtime from gfortran is incompatible with the VS 2008 C runtime (I get some scary linking errors).

So the situation is either building numpy with MS compiler, but with no hope of getting scipy afterwards, or building a numpy with crashes which are very difficult to track down. Today, I realized that I may go somewhere if somehow, I could use gfortran without using the gfortran runtime (e.g. libgfortran.a). I first tried calling a gfortran-built blas/lapack from a C program built with VS 2008, and after a couple of hours, I managed to get it working. Building numpy itself with full blas/lapack was a no-brainer then.

Now, there is the problem of scipy. Since scipy has some fortran code, which itself depends on the gfortran runtime when built with gfortran, I am trying to ‘fake’ a minimal gfortran runtime built with the C compiler. Since this mini runtime is built with the MS compiler and with the same  C runtime as used by python, it should work if the runtime is ABI compatible with the gfortran one. As gfortran is open source, this may not be intractable :)

With this technique, I could go relatively far in a short time. Among the packages which build and pass most of the test suite:
 - scipy.fftpack
 - scipy.lapack
 - some scipy.sparse

Some packages like cluster or spatial are not ANSI C compatible, so they fail to build. This should not be too hard to fix. The main problem is scipy.special: the C code is horrible, and there needs many hacks to build the C code. The Fortran code needs quite a few functions from the fortran runtime, so this needs some work. But ~ 300 unit tests of scipy pass, so this is encouraging.

by cournape at June 28, 2009 12:44 PM

June 27, 2009

Gaël Varoquaux

SciPy abstract submission deadline extended

Greetings,

The conference committee is extending the deadline for abstract
submission for the Scipy conference 2009 one week.

On Friday July 3th, at midnight Pacific, we will turn off the abstract
submission on the conference site. Up to then, you can modify the
already-submitted abstract, or submit new abstracts.

The SciPy 2009 executive committee

  • Jarrod Millman, UC Berkeley, USA (Conference Chair)
  • Gaël Varoquaux, INRIA Saclay, France (Program Co-Chair)
  • Stéfan van der Walt, University of Stellenbosch, South Africa (Program Co-Chair)
  • Fernando Pérez, UC Berkeley, USA (Tutorial Chair)

by gael at June 27, 2009 07:14 AM

June 26, 2009

Enthought

Upcoming EPD Webinar: Parallel Processing with iPython

We see our EPD Webinar sessions as a great venue for us to provide subscribers with personalized support. Is there a particular challenge you’ve encountered while using EPD? Do you feel like it would be helpful for us to walk you through a process? We encourage you to submit your questions ahead of time so that we can prepare materials and demos to meet your needs.

The webinar format enables us to respond to your questions (either by chat or VOIP, depending on your preference) and share our screen to provide examples and demonstrations.  We feel that this could become an invaluable channel of communication for EPD users, and are excited to see how it progresses.

July’s webinar will be held next Thursday at 1pm to accomodate Independence Day weekend. We plan to give an overview and demonstration of parallel processing with iPython, as we’ve seen the tremendous utility of this EPD feature overlooked in the past. Once again, however, if you have a special topic that you’d like to have addressed, feel free to write us an e-mail to tell us what content you’d like to have covered in Thursday’s session.

EPD Webinar: Thursday July 2, 2009
1pm CDT/6pm UTC.
Register at GoToMeeting.  A password to enter the webinar will be provided in your confirmation.

by info@enthought.com at June 26, 2009 09:27 PM

June 23, 2009

Matthieu Brucher

Book review: Ultimate 3D Game Engine Design & Architecture

I bought this book as soon as it was published, and I sold it soon after. Suffice to say I had a very mitigated impression after reading it. There are good things in it, but also some very bad stuff. It doesn’t describe how to write your ultimate game engine, but the author’s game engine. What about some modesty?

Let’s start with the bad stuff.

Content and opinions

This book does not show the engine architecture, the only thing you can see are some pictures with some classes. There are hints about UML, but those pictures are far from being UML. Besides code quality is really disappointing. No const-correctness, no std::string for the parameters, char* are used and then converted to strings (!!!) Really bad coding practices. Another thing is class instantiation before they are actually used whereas the text says it shouldn’t be done! Finally, the book has plenty of code pages, whereas the code is in the CD or on the Internet, so why using so much pages for something that is easily available?

For the good things, the book covers almost every aspect of a multiplatform game engine. For inputs to physics as well as graphism (OpenGL and DirectX), AI, the handling of these systems is there. If some parts show only abstraction for replacable libraries (as for graphism), other show the actual implementation, like for physics. Speaking of physics, it uses a good part of the book to explain how it works, even if it is not a fully-fledged physics engine at the moment of writting.

I regret the fact that the book is about a uncomplete and evlving engine. I also regret the script engine part, because it is kind of false (compiled or interpreted, a script language can always use a virtual machine) and pages could have been better spent for exposing the engine architecture…

Conclusion

As a conclusion, this book was the first (to my knowledge at that time) complete book on a game engine. There are more complete books on parts of it (mainly 3D engines), but nothing on a complete engine. For this, it desevres some credits. Unfortunately, the drawbacks are too big to consider this book a viable option.

Ultimate 3D Game Engine Design & Architecture (Charles River Media Game Development) (Paperback)
by Allen Sherrod
ISBN: 1584504730

Price: USD 37.77
25 used & new available from USD 36.00

| 3 | 5


by Matt at June 23, 2009 08:41 AM

June 19, 2009

Gaël Varoquaux

SciPy 2009 conference opened up for registration

We are finally opening the registration for the SciPy 2009 conference. It took us time, but the reason  is that we made careful budget estimations to bring the registration cost down.

We are very happy to announce that this year registration to the conference will be only $150, tutorial $100, and students get half price! We made this effort because we hope it will open up the conference to more people, especially students that often have to finance such trip with little budget. As a consequence, however, catering at noon is not included.

This does not mean that we are getting a reduced conference. Quite on the contrary, this year we have two keynote speakers. And what speakers: Peter Norvig and Jon Guyer! Peter Norvig is the director of research at Google and Jon Guyer is a research scientist at NIST, in the Thermodynamics and Kinetics Group, where he leads a fiPy, a finite element project in Python.

The SciPy 2009 Conference

SciPy 2009, the 8th Python in Science conference, will be held from August 18-23, 2009 at Caltech in Pasadena, CA, USA.

Each year SciPy attracts leading figures in research and scientific software development with Python from a wide range of scientific and engineering disciplines. The focus of the conference is both on scientific libraries and tools developed with Python and on scientific or engineering achievements using Python.

Call for Papers

We welcome contributions from the industry as well as the academic world. Indeed, industrial research and development as well academic research face the challenge of mastering IT tools for exploration, modeling and analysis.

We look forward to hearing your recent breakthroughs using Python! Please read the full call for papers.

Important Dates

  • Friday, June 26: Abstracts Due
  • Saturday, July 4: Announce accepted talks, post schedule
  • Friday, July 10: Early Registration ends
  • Tuesday-Wednesday, August 18-19: Tutorials
  • Thursday-Friday, August 20-21: Conference
  • Saturday-Sunday, August 22-23: Sprints
  • Friday, September 4: Papers for proceedings due

The SciPy 2009 executive committee

  • Jarrod Millman, UC Berkeley, USA (Conference Chair)
  • Gaël Varoquaux, INRIA Saclay, France (Program Co-Chair)
  • Stéfan van der Walt, University of Stellenbosch, South Africa (Program Co-Chair)
  • Fernando Pérez, UC Berkeley, USA (Tutorial Chair)

Update: I correct the typo in the original blog post: the sprints are free, the tutorial are $100.

by gael at June 19, 2009 01:53 PM

June 16, 2009

Matthieu Brucher

Book review: The Productive Programmer

What an appetizing title! This book is part of an O’Reilly serie that treats a lot of interesting topic. Contrary to Beautiful Code, this one is much shorter but the title suggest it is much more pragmatic.

Each time, the author has some thing to say, it starts with a small phrase that can be disturbing, and then a complete explanation follows. This means you should be ready to be shaken, what the author tries to teach you are sometimes things you have to think about twice (at least I had this feeling).
The book is plit in two parts, theory and pratice. The first one is the longest, with 4 chapters, still the second has more chapters, with 10 ones.

Content and opinions

So the first part is about theory. A lot of things make sense, and even if you didn’t read the book, you should know them. Acceleration is about how you can do things faster. Well, it’s not much about developing than setting up you work environment (search utilities, shells, …) so that you can access things faster. Focus is about maintaining focus during development. Less distractions so that you can keep be focused (you need 15 minutes until you achive maximum productivity, so if you’re interrupted every 5 minutes, it’s cumbersome), more space on your desktop with several monitors or virtual desktops (the book gives links to useful tools, like virtual desktops for Windows), … Automation is linked to acceleration, but it is more, it is about doing the same thing several times. Different tools (bash, ruby, …) are used in different examples. What I mainly remember is that you need to master several different tools. The last chapter is Canonality, or the DRY principle (Don’t Repeat Yourself). There are several cases where you seem to need to have several times the same things (when maintaining UML diagrams, for instance), and then using automation, you can have canonality. This is spreading through several pages, and although it is natural to use automation to achieve canonality, you have to realize it first.

The second part is about practice, and more exactly what tools and processes you should use, based on the author’s experience. It seems to be mainly based on the agile patterns, with a start on test-driven development, and then several topics on software architecture, and they come regularly back in following chapters (a good thing INO). Two chapters are about the old times, one dedicated to what good can be extracted from past experiences, and one dedicated to bad experiences that keep on harming development (mainly people that think they know the truth, without questionning themselves, the so-called angry monkeys). The last two chapters are about using the right language for the application and using the right tool (the IDE in this case) for maximum productivity. This part about the actual ways of being more productive achieves, IMHO, its goal with good impulses for a productive programmer.

Conclusion

A lot of the tools that are presented in the first part are not free or open source, and sometimes there exists alternatives in the free community, so you have to look for them. The secon part tends to look more towards those free tools.
The whole book is definitely about good practices, with some more controversial than others. Some of them are also difficult to apply to any language, as tools for this practice are missing. I think the book describes a good set of practices to try to apply personnally ; at least, I’ve decided to try to and to better use the command-line and my IDEs (also fight angry monkeys).

The Productive Programmer (Theory in Practice (O’Reilly)) (Paperback)
by Neal Ford
ISBN: 0596519788

Price: USD 26.39
53 used & new available from USD 19.75

| 4.5 | 20


by Matt at June 16, 2009 08:49 AM

June 14, 2009

Gaël Varoquaux

Fuzzy on OOP and the French

Fantastic:

Haha - I shake my fuzzywuzzy beard at you in bewilderment. Do you people dislike OOP, the class statement is mere boilerplate to you, I mumble incoherent French obscenities in your general direction. (Did you know the French acronym for object-oriented programming is POO?).

by gael at June 14, 2009 09:38 AM

June 12, 2009

Enthought

June 19 Public Webinar on Python for Scientific Computing

I had such a blast at the last public webinar that we did to promote Python for Scientific Computing. I am really looking forward to the next webinar which is only a week away (June 19). We had 100 people attend the last one. I know that some who wanted to attend could not because of a mix-up on times, or a problem with the fact that GoToMeeting doesn’t support Linux (I’m not very happy about that, but I don’t see another option right now). I apologize for all those problems, but hope you will try to attend again.

There is a lot that we could cover in these webinars, and I’m anxious for your feedback about what you would like to see. My plan is to put a schedule together so the topics are listed through the end of December after this next webinar. Now is the chance to make your opinion known if you’d like to steer these webinars in a particular direction. Schedules are busy and varied, so I’d like to give plenty of notice so that more people can attend the webinar they are most interested in.

In this upcoming webinar we are going to provide an introduction and demo of Chaco (which we didn’t get to the last time). If there is time, I will also continue the Mayavi demonstration (particular the mlab interface) that we started last time, but I also wanted to showoff EPDLab to a wider audience. You can register for the webinar at https://www1.gotomeeting.com/register/303689873

In the EPD subscriber webinar on June 5th, we discussed EPDLab (an open-source interactive Python environment included as part of EPD). Because EPDLab is a free and open-source project that anyone can participate in, contribute code to, and use as they would like, I think it deserves some attention at this next public webinar. Not only does it provide an enhanced scientific computing environment, it also provides an introduction to the Enthought Tool Suite (a free and open-source collection of tools for building compelling scientific applications — it goes by the abbreviation ETS).

I hope you will excuse a brief aside to clarify ETS and its relationship to EPD. Because we do sell a binary distribution of Python tools called the Enthought Python Distribution (EPD — which also happens to contain ETS), there is sometimes some confusion regarding the license and availability of ETS. ETS is a large BSD-licensed open-source collection of tools with a public SVN repository that anyone can contribute to and participate in the development of. Enthought has released a lot of code in that library which has made it possible for us to write sophisticated, compelling, and attractive scientific computing applications for our customers. ETS contains multiple separate projects. The most important and developed of these projects are Mayavi, Chaco, Traits, TraitsUI, and Envisage. You can learn more about ETS at Enthought’s open source portal.

But, Enthought is a small company and the majority of our marketing effort right now is centered around getting the word out about EPD and our other products and services like training and custom software creation. We don’t have the man-power to advertise ETS very well at the same time, and it can be a little confusing that EPD the distribution does cost money for commercial use, but ETS is free and open-source. Fortunately, people like Gael Varoquaux and Prabhu Ramachandran lead the internet charge to spread the word about the great tools in ETS.

I’m looking forward to seeing many of you on-line again at 1:00pm (Central Daylight Time) on Friday, June 19th. Slides and a recording of the webinar will also be made available here after conclusion of the webinar.

by Travis Oliphant at June 12, 2009 04:27 PM

June 11, 2009

June 09, 2009

Matthieu Brucher

Review of Intel Parallel Studio (beta)

Since this post, Intel has officially released Parallel Studio. This is why I’ve published a new, up-to-date review here.


by Matt at June 09, 2009 08:33 AM

June 07, 2009

Gaël Varoquaux

Job offering for junior Python developer

Our lab is seeking to hire an engineer to work on porting our machine learning code to the scikit learn, adding tests and documentation and packaging it.

We are looking for someone motivated by quality in software and open source. No prior scientific computing experience is required. You will be working in a highly stimulating research environment (Neurospin), near Paris and employed by the French research institute in computer science and applied math (INRIA), a prestigious institution.

Neurospin is a research institute dedicated to the understanding of the brain. You will be working with computer-assisted neurology laboratory, the image-analysis and branch of Neurospin, in the small ‘Parietal’ INRIA team embedded in NeuroSpin and dedicated to statistical modeling.

Over the years, the lab has developed a set of tools for machine learning and statistical analysis in Python (with some C). There are some tools for this purpose available in the open-source world (BSD-licensed) in the scikit learn. We want to extract the good and unique parts of our internal library, and release it in the open source world through the scikit learn. Our code is fully Python code, using scipy and matlab, with some bindings to R. As we want the code to be BSD-licensed, we will remove the bindings with R, and replace when possible. The job does not involve developing new algorithms, but testing, improving, and documenting the existing one. There is a big quality assurance work to be done. The code needs to be put to the right coding standards; APIs should be cleaned; tests added. Dead code should be delete. There is some optimization work to be done. Also, if there is any duplicated funcitonnality with the scikit learn, you should analyse both code and determine which one to code. The job also involves working with the community, documentating the code, and releasing the project, including binary packages. And finally, all the original authors of the algorithms, and experts in the field, are in the lab. So you will be able to learn from them and pester them if there is a problem with the code.

In one word, this is about transforming an internal project, into a leading open source project that will rock and live on!

The job description is available here.

There are to caveats: first it is a 2 year position. Second, you need to have graduated recently (how recently I don’t know exactly, but I will inquire).

If you are interested, or just want to ask questions, don’t hesitate to send me an e-mail, I am _really_ looking forward to collaborate with someone motivated on this project.

UPDATE: I have more details on the restrictions of the job offering: you need to have graduated in 2008 or 2009. This is a very hard restriction, and I am recieving many excellent CVs that I even consider because of this restriction. I am sorry, I cannot do anything about it.

by gael at June 07, 2009 06:53 PM

June 05, 2009

Enthought

First EPD Webinar

I wanted to thank everybody who came to the first EPD Webinar which was held today at 1:00pm CDT in our offices at Austin. We had a few technical glitches which our team resolved quickly. I then spent about 30 minutes showing off new features of EPD such as EPDLab, indexed searching of docstrings with Whoosh, and the new curve_fit function from scipy.optimize. Dave Peterson then spent about 30 minutes showing the use of enpkg which is a command in EPD to allow update, upgrade, and rollback service for egg-packages. This tool should allow subscribers to EPD to keep up to date without having to download and install new installers every time a release is made.

A packaging tool like enpkg has been on the roadmap since the beginning, and it was encouraging to see it in action. There are still a few speed and “verbosity” issues that we are cleaning up, but it looks like a good start to what should be a very useful feature for EPD.

In the future, the EPD webinars will contain about 30-45 minutes of training material drawn from the 7-days of course material on scientific computing with Python that we teach regularly. If there are particular points you would like to see covered, please let us know at info@enthought.com. The current plan for the next EPD Webinar is to provide training on the statistical capabilities of SciPy.

All who attended had the chance to ask questions directly of the Enthought attendees. We look forward to answering more of your questions in the future. The next EPD webinar is scheduled for Friday, July 3 at 1:00pm CDT. If you are an EPD Subscriber at the Basic level, please register. We look forward to your attendance and questions. Feel free to send pre-webinar questions to info@enthought.com

The next public webinar for general scientific computing with Python is
Friday, June 19 at 1:00pm CDT. This webinar is open to all that would like to attend. Right now the plan is to show-off the open-source EPDLab and give an overview of all the tools that are brought together in EPD. You can sign up now for the event. I look forward to seeing many of you attend.

by Travis Oliphant at June 05, 2009 10:11 PM

June 04, 2009

Enthought

EPD: Aiming for x86_64 OS X builds

We’ve recently made the decision to start applying resources to generating x86_64 OS X builds of EPD.  Because of limited resources, this means we’re officially dropping PPC support in EPD.  It also means that it may take us months to get things released for the x86_64 (also known as amd64) architecture.

As an example of some of the issues we’ll face, we’ll need to decide how to handle the GUI backend situation.  You see, the wxWidgets project hasn’t yet released 64-bit build support for OS X’s Cocoa framework, and the Carbon framework isn’t 64-bit, so we’re stuck either starting with a “server” / console build of EPD, shipping on an unreleased version of wxWidgets, delaying the release while we help finalize x86_64 Cocoa builds of wxWidgets and wxPython, or switching to a different backend like Qt.

While Qt and the PyQt (Python bindings for Qt) seems like a no-brainer technology-wise, the license situation is a hurdle for us to overcome.  We’ve tried hard, but haven’t always succeeded, to avoid GPL licensed projects in EPD in order to make it more palatable to commercial users, like even our own consulting projects.  And, yes, Qt itself recently came out with an LGPL license option that would suit EPD’s needs, but PyQt isn’t similarly licensed (yet).  So now we have to decide whether such a core capability (the only GUI backend of OS X x86_64) would be acceptable to be GPL licensed.

If any one has any thoughts or suggestions on how to resolve this issue, please don’t hesitate to let us know!

By the way, regarding the PPC situation, we have effectively already started to drop PPC support with the EPD Py25 v4.3.0 release.  We made a good faith effort to build in the PPC support but simply didn’t do significant testing of the results.  In the end, it turns out that at least one core module, SciPy, ended up with binaries that don’t fully support PPC.  Sorry, but we do not plan to issue fixes for this.

If you’re a PPC user, the last working version of EPD for PPC was EPD Py25 v4.2.30201.

by Dave Peterson at June 04, 2009 05:09 PM

Matthieu Brucher

Book review: Game Guru: Strategy Games

Strategy games are the type of games I prefer. Turn-based or real-time, they share some common ground. This book tries to explain them.

Content and opinions

The book is split in three parts: war, peace and design.

War is generally at the center of a strategy game (war, or tensions that can lead to it). War consists of units fighting together. Equilibrium is something that is not easy to achieve, and the game must suggest different strategies with those units to remain interesting. Relevant facts about those topics are introduced.

The Peace chapter is not only about peace, perhaps only truce before the war. It’s about resources, technologies or stuff like that. It is not mentioned that too many different resources can lead to a bad game as well (too complicated to handle, an AI that cannot cope with it can ruin the game, …)

Finally, almost half of the book is dedicated to design. Not only interface design, but also game and gameplay design. Do I need a hero? How far does the world strech? These kind of questions can make really different games, and also a game that wanted things to big can in the end be a failure because of these.

Conclusion

This small book can be read in a few hours. It is mainly based on actual game images, but their legends are too small, especially when they convey useful information. This problem put aside, the book is really enjoyable, it does not go into too many details, only to give a first overview of what a strategy game should be.

Game Guru: Strategy Games (Paperback)
by Dave Morris
ISBN: 1592002536

Price:
21 used & new available from USD 8.95

| 4 | 1


by Matt at June 04, 2009 08:03 AM

June 03, 2009

Titus Brown

Seeking: independent study student for tech reporting on Python

I'd like to find an MSU student to report semi-monthly on python-dev. The student would be responsible for monitoring the python-dev mailing list and active PEPs, summarizing substantive discussions in a public forum, and integrating feedback from the community. This would be a 1 credit CSE independent study course (CSE 490). Additional effort (for more credits) could be applied towards building and maintaining a CMS site to store and reference past and present summaries, or integrating reviews of new modules.

The ideal student would be someone who communicates well in writing, is interested in technical reporting, and has some basic experience with programming. Python experience (CSE 231) is a plus.

Please send a brief summary of interests together with a sample of writing to ctb@msu.edu.

--titus

June 03, 2009 02:12 PM

Hey look, it works!

Apparently the ipaddr module in Python 3.1 is disliked by some, and there was a reasonably robust discussion on python-dev about how it's wrong, wrong, wrong. Guido finally ruled: ixnay on the addr-pay.

This is pretty relevant given the twitstorm caused by Zed Shaw's ludicrously self-confident rants about how he always knows best and is a kickass programmer and oh, by the way, the Python stdlib is kinda lousy in places. I think the thing to take away from Zed's rant is that the Python module addition process is, in fact, moderately FUBARed, with some people able to add perhaps ill-considered modules while others have to struggle to get the time of day. (Aahz's solution is good -- require a PEP.)

It's relevant personally, too, as I dig my way through some of pygr's modules. It's way easier to add code than it is to refactor it, especially if you don't have a lot of unit tests; if you want to retain backwards compatibility, you're basically doomed. DOOOMED, I say! And that's why the Python stdlib has so many issues.

(Incidentally, nothing against Zed Shaw -- obnoxiousness is his public persona, and he's definitely worth listening too -- but it is funny to realize that all his articles contain arguments that boil down to "he always knows best and is a kickass programmer." I especially liked his statistics rant.)

--titus

June 03, 2009 02:57 AM

June 02, 2009

Matthieu Brucher

A quick hack to use the MKL with numpy/scipy on Linux

I’ve promised to make an update whenever I would find a solution to the problem I had some months ago when I tried to use the latest MKL with numpy. Well, there was a simple hack that did the trick. It is far from being perfect, but at least, the tests pass now.
So the only thing you have to do is to export the LD_PRELOAD variable:

export LD_PRELOAD=/path/to/the/MKL/lib/libmkl_core.so


by Matt at June 02, 2009 08:17 AM

June 01, 2009

Enthought

The EPD repository has RSS feeds

We’ve just updated the EPD product website with links to a new EPD repository RSS feed: http://www.enthought.com/products/epd-rss.xml.

This feed is updated every time we release an update / upgrade to a project included in EPD, and soon everytime we publish a new installer.  All entries into the feed specify the platform the update / upgrade was released for, the date and time of release, the name of the thing being updated / upgraded, and a short description of the project being updated.

We’ve added this feed to both the main EPD product page, and also the ‘Download’ page — they’re the same feed so no need to subscribe twice!  For those with browsers that recognize RSS declarations in the page/HTML headers, you’ll see the RSS feed icon in your browser’s URL field when visiting these pages.   For others, simply click on the explicit RSS link on the main page right under the big, red, “Download Now” button.

As an alternative to this all-platforms feed which is available to everyone, even those without an EPD subscription at all, current EPD subscribers with access to the repository can subscribe to a filtered RSS feed for their target platform(s).  These can be found within the  various platform-specific directories under the top-level ‘eggs’ directory.  Note that only those with ‘Basic’ subscriptions and above can access the repository.

Please don’t hesitate to send us suggestions on how we can continue to improve EPD!

by Dave Peterson at June 01, 2009 09:18 PM

May 26, 2009

Enthought

Webinar recording available

Enthought’s first public webinar was a success, despite a few technical glitches. The large attendance in spite of the short notice was gratifying. Travis was able to cover only a fraction of the material he had hoped to cover, so there is plenty of material for future sessions (such as Chaco and Mayavi).

The recording of the webinar is now available for download. While the native recording format of GoToWebinar is Windows Media Player, we have converted it to Matroska format, so we hope that folks on all platforms will be able to view it. Please let us know if you have problems getting or playing it.

The next public webinar will be Friday, June 19 at 1:00 CDT. Specific topics TBD.

Meanwhile, we are launching a second webinar series, exclusively for subscribers to the Enthought Python Distribution at the Basic support level or higher.  Those subscribers will receive an e-mail announcement shortly.

by Janet Swisher at May 26, 2009 08:58 PM

Matthieu Brucher

Book review: IronPython in Action

IronPython is the first dynamic language developed for the .Net plateform. At first, .Net didn’t support this kind of language. This is something that keeps on coming back througout the book: you have to use some additional tricks to unleash the power of .Net dynamic and static languages.

Content and opinions

The book starts with a general introduction to IronPython. A quick review of the language itself is followed by the use of the .Net assemblies. At the end of this part, one is comfortable enough to do some small IronPython programs.
The next part is dedicated to what IronPython offers thanks to Python and to its .Net affiliation. The authors go through standard Python (battery included) and the somewhat associated .Net assemblies (some arguments on using one or the other could have been a big plus to the explanations), depending on what must be done. Because or (or thanks to) .Net, several pages are dedicated to XML, as it is needed to simplify the description of UIs. Also several useful designed patterns are presented with the .Net approach.
The next part starts with WPF, the official graphical interface, with several ways of using it (bridge from C#, XAML, …). Then WMI (used for system administration) is handled, but from my point of view, it is the weirdest part. WMI has its own language which does not seem like C# or Python. Besides, PowerShell, presented as well as a way of doing system administration, has its own language. There is a book dedicated to PowerShell, so only the communication between IronPython and PowerShell is handled. So two additional languages in this chapter, perhaps too many (they are limited to this chapter).
IronPython is a .Net language, so it is possible to do ASP with it. A chapter deals with this approach, chapter well written but it needs to follow the associated example in your favorite IDE if you want to follow what’s happening. Web means also web services and databases, handled in one chapter. The basis of SQL tools addressed, as well as basic webservces (mainly REST). I have to say that there are some mistakes there, as SOAP is not only used with POST HTTP requests but also with GET requests (it can be seen in the official w3c specification) and also with other transport protocols than HTTP. Perhaps these are .Net implementation’s limitation, in which case it should have been mentioned Finally Silverlight integration allows developping light clients that can interact with other langages as well as the web page.
Throughout the book, complete interaction with other .Net languages was not addressed. It is the goal of the last part to show how assemblies can be used in IronPython and how IronPython scripts can be used from .Net static languages. As I’ve said, the interaction does not go completely smoothly, there are several solutions to accomplish it. At least, the book does not only speak about the upcoming .Net 4.0 that will help this interaction.

Conclusion

As a conclusion, those who need a dynamic language (to script an application) can go for IronPython, th first dynamic language for the .Net framework, compatible with the langage Python 2.5, and in that case, go for this book that will help you for anything.

IronPython in Action (Paperback)
by Michael Foord, Christian Muirhead
ISBN: 1933988339

Price: USD 29.69
44 used & new available from USD 21.50

| 4.5 | 7


by Matt at May 26, 2009 08:30 AM

May 25, 2009

David Cournapeau

cournape


I have heard several times that every linux distribution should have the same package manager (where it is understood that there is one-too-many within the rpm vs deb), and it was mentioned once again recently in a well publicized video (see on linux hater blog)

The argument goes as follows: doing packaging takes time, and making packages for every distribution is a waste of time. If every distribution used the same package system, it would be much better for 3rd party distributors. Many people answer that competition is good, having many distributions is what makes Linux great – [insert usual stuff about how good Linux is].

While it is true that multiple packages systems means more work, saying that there should only be one is kinda clueless – I wonder if anyone pushing for this has even done any rpm/deb pacaking. What makes deb vs rpm a problem is not that they are different formats, like say zip vs gunzip, but that they are deployed on different systems. A RHEL rpm won’t work great on Mandrake, and even if a lot of debian .deb work on Ubuntu, it is not always ideal. The problem is that each distribution-specific package needs to be designed for the target distribution. To build a rpm or a deb package, you need:

  • To decide where to put what
  • To encode the exact versions for the dependencies
  • To decide how to handle configuration files, set up start/stop scripts for servers, etc…

Basically, almost everything which makes the difference between a distribution A and B ! For file locations, the LSB tries to standardize on this, but some things are different, like where to put 64 vs
32 bits libraries. One distribution may have libfoo 1.2, another one 1.3, so even if they are compatible, you can’t use the same for every distribution. Or some libraries do not have the same name under different distributions.

So requesting the same package manager for every distribution is almost equivalent to asking that every distribution should be the same. You can’t have one without the other. You can argue that there should be only one distribution, but don’t forget that Ubuntu appeared like 5 years ago.

by cournape at May 25, 2009 11:16 AM

May 20, 2009

Enthought

Python for Scientific Computing Webinar

We are trying something new at Enthought. I’m going to host the first Enthought Webinar on Scientific Computing. This webinar is free for people interested in showing up. You should plan to come with a bit of patience as we may not have all the wrinkles worked out of the technology and what it means for having a discussion.

I want to spread the word about all the very cool tools that the Open Source community has produced for using Python in Science. The first webinar will be on Friday at 3:00pm CDT. You can attend via your computer by registering at the following link: Python for Scientific Computing Webinar. I will be talking about NumPy, structured data-types, and memory mapped arrays (and how to use them for reading data quickly from files). I will also be showing off Chaco for 2-d interactive visualization and Mayavi for 3-d visualization. Come with questions as you will have the opportunity to ask them if you would like.

We will start 15 minutes early for people who want to get help setting up with the Webinar technology from GotoMeeting. If you have never attended a Webinar before, you may want to come and try it out. I look forward to seeing many of you online this Friday.

by Travis Oliphant at May 20, 2009 11:10 PM

May 19, 2009

Matthieu Brucher

Interactive RayTracer

Some months ago, I’ve decided to dig into raytracing, and more exactly interactive raytracing. So I’ve started writting my own library, based on several publications.
nVidia announced recently its own framework, Intel wants also to do raytracing on Larrabee, it is the current trend.

First, raytracing is a tool to draw a picture from a scene. It’s like a camera observing the scene, and to know the content of each pixel on the film, a ray is cast through the scene. Depending on the objects, rays will propagate to other objects or lights, and then the actual color of the camera pixel can then be computed.

A raytracer is in fact really easy to write. The math formulae are simple for a version that will return satisfactory results. On the contrary, an interactive or “real-time” one is more complicated. This is why I started my small project and why I will blog from time to time about it.

Interactive or real-time means that the drawing must be fast, really fast. This also means that realistic drawing is out of the question. I’ve started with a simple raytracer, with some optimizations, and then I’ve added some algorithmic optimizations. I will talk about all this in several posts.

How does a raytracer actually work?

In physical vision, light rays travel around the world from the sources, hitting objects, then reflected or transmitted and finally arriving at the eye (or camera). The principle of raytracing is to do it backwards, from the camera to the light sources. The camera film, or the retina, is symbolized by a screen.

How a raytracer works

How a raytracer works

In this case, the emitted ray hits object 1 with some angle to the normal (the normal vector is the vector orthogonal to the tangent plane of the object). Object 1 gets some light and so the color ray depends on this light, the object and the angle between the normal and the direction of the light.
The ray is then reflected to object 2 and hits it with another angle. As for object 2, it gets some light and the reflected ray carries a specific color. This color is then blended with the color of the primary ray, and this final color is the color of the pixel on the screen.

Coming next

I’ve written my library in C++, tested it with several compilers on different platforms. So as to test the code, to profile it and to create a scene in an elegant manner, I’ve added a small Python wrapper. This will be the subject of my next post.

Then, I’ll go through secondary and shadow rays, a GUI for the raytracer, acceleration structures, …

The code is released under the LGPL on Launchpad.

Additional links

Interactive Raytracer on Intel website

An Intel article on real-time raytracer architecture

nVidia presentation on raytracing


by Matt at May 19, 2009 08:28 AM

May 18, 2009

Barry Wark

Moving...

The Physion Consulting blog will be moving to blog.physionconsulting.com. Please join us there.

by Barry (noreply@blogger.com) at May 18, 2009 10:39 AM

May 17, 2009

Titus Brown

Upgrading PlanetPlanet.

OK Folks, I know that planet.python.org and planetpython.org underwent a merger, and during the merger a new, or patched, or somehow upgraded version of planet went into effect on both. However, I cannot find a link to the info post any more.

I would like to put the latest stable version of PlanetPlanet into effect on the Google Summer of Code/Python site but I am wary of using the devel repo without any inside info. (I am currently running 2.0, which is the latest official release.)

Should I use the dev repo, or should I track down whatever version planet.python.org is using?

thanks!

--titus

May 17, 2009 07:53 PM

May 16, 2009

Gaël Varoquaux

Pycon FR: presentations and tutorials

May 30th and 31st the French Python conference, Pycon FR, will be held at ‘la citée des sciences’, la Villette, in Paris.

The first day, I will be giving a one-hour-long tutorial (in French) on numpy, scipy, and all the Python for Science jazz. On the following day, I will be giving a half-hour-long talk to ilustrate the use of Python in my current work: statistical analysis and modelling of brain activity.

I’ll be giving my tutorial in one room, while David Larlet (the famous Biologeek) will be giving one on Django in another room. Tough competition :-P .

The program of the conference is very eclectic, ranging from general programming talks, to GUIs or web development. While this might deter the pure scientific computing folks, I strongly encourage you to attend. Indeed, a lot of the development, packaging, quality assurance, … problems encountered in scientific computing are universal in computing.

You might think that you are only interested in writing algorithms,or processing data, but this code will have to live on. My experience is that it is terribly hard to have code in a lab that can be somewhat shared and live on when people move away to another lab, or stop having time to maintain the code. Talks like

can probably be of some use.

Also, don’t underestimate the fact that some other communities might have solved some of the issues you struggle with. When dealing with real-world problems, and not only developing algorithms on a few set of test data, a large fraction of the code lines and related to IO, interfaces, data massaging… Two years ago, I remember that I was not terribly interested in the web-development talks. I tried to be open-minded and listen to them, but… Now I have done a bit of web development myself, and I have played with some of the famous ‘web frameworks’. I can tell you, there are some really interesting concepts there. The web guys have managed to extract a set of patterns from the problems they face and provide excellent abstracts to data handling and display. Can we learn from them? I am especially interested in getting more insight from things like ORMs (object relational mappers), and understanding better the web frameworks:

And finally, one more reason to come: it is so nice to actually get to meet in real life people, and have a chat.

So, see you there, for those who live in France.

by gael at May 16, 2009 03:25 PM

Titus Brown

Easily Accessible Web-Based Tools For Analyzing Next-Generation Sequencing Data From Agricultural Animals

Just submitted this on Thursday:

Next generation sequencers are beginning to impact agricultural biology. Over the next few years, next generation sequencing will produce incredibly large datasets that will address structural (e.g., SNPs, CNVs, indels, methylation, translocations) and functional (e.g., RNA expression, transcription factor binding sites) variation in genomes that will provide detailed insights that could explain phenotypic variation. Despite this immense power, next generation sequencing in agricultural animals will not be used effectively due to the lack of easy-to-use computational tools to support data analysis, and the unique needs of agricultural animal genomes. We propose to build an easy-to-use Web interface that incorporates several existing mapping and post-mapping analysis programs for next generation sequencing data that will greatly empower agricultural researchers. We will also provide solutions to issues such as unfinished and unannotated assemblies, private data sets, private annotations, etc. Our tools will give individual investigators or small groups with no computational support the power to utilize and interpret next generation sequencing data.

Any guess as to the funding agency? Yep....

The exciting life of a professor continues!

--titus

May 16, 2009 04:34 AM

May 15, 2009

Titus Brown

Proposal: the Python Buildhaus

I just submitted a Mellon Award for Tech Collaboration nomination for the Python Buildhaus. What's that, you ask?

The Python Buildhaus is a project to systematically build, test and release Open Source Python packages on Windows, Mac OS X, and a wide array of other UNIX architectures and operating systems (see snakebite.org for list). In addition to providing machine access, software support, and process support, we hope to create a set of best practices and process documentation to help the community address cross-platform compatibility issues. We will also build tools to extend the impact of this effort beyond Michigan State by providing longer-lasting developer resources, e.g. tools to auto-build Python eggs and installers across multiple platforms.

This will be an open resource for the Python community.

See the Python Buildhaus and our proposal.

This is basically an attempt to use Snakebite to push specifically to help with the cross-platform distribution problem.

--titus

May 15, 2009 06:39 PM

May 12, 2009

Enthought

Enthought and Economic Science

A few of us at Enthought (Peter Wang, Robert Kern, and I) traveled to Toronto two weeks ago to attend a very interesting summit of scientists and others connected to finance and economics to discuss whether and how science can provide assistance in understanding economics sufficiently to prevent or at least mitigate economic breakdowns such as the one we’ve just experienced (and are still dealing with). The conference was titled The Economic Crisis and its Implications for the Science of Economics. Some background material for the conference can be read at Edge.org, and at least two blog-posts covering the conference can be read: one by Stephen Hsu and another by Barkley Rosser.

We were invited because Eric Weinstein is a fan of Python and the tools in the Enthought Python Distribution (including NumPy, SciPy, SymPy, MayaVI, and Chaco). Robert Kern produced some very nice visualizations for Eric’s talks in the conference using MayaVi and Chaco which can be seen in Eric Weinstein’s two talks: 30 minutes into the first one and 45 minutes and again 1:30 minutes into the second one (Actually, the second talk was Pia Malaney’s talk and Eric enthusiastically joined her half-way through — I guess being married has its advantages for getting more air time.)

The conference was intellectually stimulating and very enjoyable. I enjoyed all of the conversations I personally had with the participants which ranged from probability theory to cognitive neuroscience to quantum mechanics to computer platforms for agent-based modeling. I encourage you to read and listen in more depth to what the participants had to say in their talks because I won’t be able to provide sufficient summary to the conference. All of the conference talks are online. What isn’t shown in the videos, though, are the break-out discussions that took place between sessions and at meal-time.

In these break-out discussions I enjoyed getting to know all four members of the PartEcon team. Apparently, Mike Brown organized this group after an agent-based model (discussed at the conference by Alexander (Sasha) Outkin) predicted some useful results of changing the tick-size to decimals on the NASDAQ. They have incorporated principles of double-entry book-keeping into their agent-based model. They also stayed after the conference to continue comparing notes with another team from the Perimeter Institute that had written about an agent-based model using a more formal setup (by Samuel Vazquez and Simone Severini).

While there was one early talk on the first day by Richard Alexander that touched on the genetic component of human agents, the impact of having evolutionary biologists present (like him and one of his students, Bret Weinstein) was much larger than their presentation footprint. They provided insightful discussions during several break-out sessions (Peter Wang even commented that in another life he might have become a biologist).

Lee Smolin sent around a very nice summary of the conference and suggested a unifying theme of “path-dependence in economic dynamics.” Eric and Lee were both there to explain how gauge theory provides the tools to solve the problem of changing preferences that has plagued traditional academic economics. Eric did a great job of showing how this manifestly untrue concept of unchanging preferences has at least been put forward by several leading economists. It’s still unclear to me whether or not gauge theory actually provides new results, but it definitely seems like a more useful mathematical toolbox to use and build from.

I was disappointed that amidst all the discussion of the failure of economic modeling there was not at least some discussion about the Mises-Rothbardian ideas of fiat currency and fractional-reserve banking being the primary source of the booms and resulting busts. I wanted to learn from the people there rather than try and debate this one particular theory of economics so I pretty much stayed quiet. One gentleman sitting next to me during the first day asked the panel whether the crises shows the failure of fiat currency and and got a very unsatisfying answer from Nouriel Roubini that simply dismissed the question, but did not really address it.

Given that the economic experts have basically shown repeatedly they don’t know what they are doing, intellectual honesty would seem to me to require listening to all sides of a debate, instead of dismissing a whole theory of economics (such as the Austrian school) primarily because it doesn’t use math as its starting point. Fortunately, there are very good texts that argue against fractional-reserve banking and the role it may actually play in causing economic instability. One of them is “Money, Bank Credit, and Economic Cycles” by Jesus Huerta de Soto.

I really enjoyed the conference because it seemed to combine all of the interests I’ve developed over the years: math, probability theory, neuroscience, economics, and computers. I’ve had a hobbyist interest in Economics ever since graduate school at the Mayo Clinic when I was learning about Linux and Python. I fell in love with open source software but wanted to understand how “giving software away” could work sustainably in a society. It was this question that led to me finally reading Mises and Rothbard and a whole host of other non main-stream economists. I can’t say I’ve figured anything out, but I have very much enjoyed the ride.

I’m also very hopeful in some of the ideas I saw at the conference that may help us inch closer to an understanding of the truth of an economic system (mathematically modeling changing preferences, using agent-based models, and even the idea of local currencies that was discussed among some at the conference).

In the more immediate future. It looks like there is some discussion afoot for building a platform for agent-based modeling that I hope Python plays prominently in. There is a real power in using an expressive and dynamic language like Python that allows for rapid development. It is a general-purpose language that scientists and engineers can actually get excited about. In addition, the work of Paul Borrill’s company (Replicus) in creating an agent-based storage solution looks immediately promising. Perhaps Enthought can provide some tools to assist in managing such a system. I’m enthused and anxious to continue to support the improvement of using computers to help solve some of the world’s most challenging problems. There is much more that could be said, but I’m sure this blog (with no photos) is long enough.

by Travis Oliphant at May 12, 2009 11:36 PM

David Cournapeau

cournape


As I have already written in a previous post, I have moved away from bzr to git for most of my software projects (I still prefer bzr for documents, like my research papers). A lot if not most of the comparison of git vs other tools focus on speed. True, git is quite fast for source code management, but I think this kinds of miss the point of git. It took me time to appreciate it, but one of the git’s killer feature for source code control is the notion of content tracking. Bzr (and I believe hg although I could not find good information on that point) use file id, i.e. they track files, and a tree is a set of files. Git, on the contrary, tracks content, not files. In other words, it does not treat files individually, but always internally consider the whole tree.

This may seem like an internal detail, and an annoyance because it leaks at the UI level quite a lot (the so-called index is linked to this). But this means that it can record the history of code instead of files quite accurately. This is especially visible with git blame. One example: I recently started a massive surgery on the numpy C source code. Because of some C limitations, the numpy core C code was in a couple of giantic source files, and I split this into more logical units. But this breaks svn blame heavily. If you just rename a file, svn blame is lost can follow renames. But if you split one file into two, it becomes useless. Because git tracks the whole tree, the blame command can be asked to detect code moves across files. For example, git blame with rename detections gives me the following on one file in numpy:

dc35f24e numpy/core/src/arrayobject.c         1) #define PY_SSIZE_T_CLEAN
dc35f24e numpy/core/src/arrayobject.c         2) #include <Python.h>
dc35f24e numpy/core/src/arrayobject.c         3) #include "structmember.h"
dc35f24e numpy/core/src/arrayobject.c         4)
65d13826 numpy/core/src/arrayobject.c         5) /*#include <stdio.h>*/
5568f288 scipy/base/src/multiarraymodule.c    6) #define _MULTIARRAYMODULE
2f91f91e numpy/core/src/multiarraymodule.c    7) #define NPY_NO_PREFIX
2f91f91e numpy/core/src/multiarraymodule.c    8) #include "numpy/arrayobject.h"
dc35f24e numpy/core/src/arrayobject.c         9) #include "numpy/arrayscalars.h"
38f46d90 numpy/core/src/multiarray/common.c  10)
38f46d90 numpy/core/src/multiarray/common.c  11) #include "config.h"
0f81da6f numpy/core/src/multiarray/common.c  12)
71875d5c numpy/core/src/multiarray/common.c  13) #include "usertypes.h"
71875d5c numpy/core/src/multiarray/common.c  14)  
0f81da6f numpy/core/src/multiarray/common.c  15) #include "common.h"
5568f288 scipy/base/src/arrayobject.c        16)
65d13826 numpy/core/src/arrayobject.c        17) /*
65d13826 numpy/core/src/arrayobject.c        18)  * new reference
65d13826 numpy/core/src/arrayobject.c        19)  * doesn't alter refcount of chktype or mintype ---
65d13826 numpy/core/src/arrayobject.c        20)  * unless one of them is returned
65d13826 numpy/core/src/arrayobject.c        21)  */

You can notice that the original file can be found for every line of code in the new file. The original author and date may be found as well, I just removed them for the blog post.

This is truely impressive, and is one of the reason why git is so far ahead of the competition IMHO. This kind of features is extremely useful for open source projects, much more than rename support. I am ready to deal with quite a few (real) Git UI annoyances for this.

Edit

It looks like my example was not very clear. I am not interested in following the renames of the file: in the example above, the file was not arrayobject.c first, then renamed to multiarraymodules.c, and later to common.c. The file was created from scratch, with content taken from those files at some point. You can try the following simplified example. First, create two files prod.c and sum.c:


#include <math.h>
double sum(const double* in, int n)
{
 int i;
 double acc = 0;

 for(i = 0; i < n; ++i) {
 acc += in[i];
 }

 return acc;
}

#include <math.h>

double prod(const double* in, int n)
{
 int i;
 double acc = 1;

 for(i = 0; i < n; ++i) {
 acc *= in[i];
 }

 return acc;
}

Commit to your favorite VCS. Then, you reorganize the code, and in particular you put the code of both files into a new file common.c. So you create a new file common.c:

#include <math.h>

double prod(const double* in, int n)
{
 int i;
 double acc = 1;

 for(i = 0; i < n; ++i) {
 acc *= in[i];
 }

 return acc;
}

double sum(const double* in, int n)
{
 int i;
 double acc = 0;

 for(i = 0; i < n; ++i) {
 acc += in[i];
 }

 return acc;
}

And commit. Then, try blame. Rename tracking won’t help at all, since nothing was renamed. On this very simple example, you could improve things by first renaming say sum.c to common.c, then adding the content of prod.c to common.c, but you will still loose that the prod function comes from prod.c. git blame -C -M gives me the following:

^ae7f28a prod.c  1) #include <math.h>
^ae7f28a prod.c  2)
^ae7f28a prod.c  3) double prod(const double* in, int n)
^ae7f28a prod.c  4) {
^ae7f28a prod.c  5)         int i;
^ae7f28a prod.c  6)         double acc = 1;
^ae7f28a prod.c  7)
^ae7f28a prod.c  8)         for(i = 0; i < n; ++i) {
^ae7f28a prod.c  9)                 acc *= in[i];
^ae7f28a prod.c 10)         }
^ae7f28a prod.c 11)
^ae7f28a prod.c 12)         return acc;
^ae7f28a prod.c 13) }
^ae7f28a sum.c  14)
^ae7f28a sum.c  15) double sum(const double* in, int n)
^ae7f28a sum.c  16) {
^ae7f28a sum.c  17)         int i;
^ae7f28a sum.c  18)         double acc = 0;
^ae7f28a sum.c  19)
^ae7f28a sum.c  20)         for(i = 0; i < n; ++i) {
^ae7f28a sum.c  21)                 acc += in[i];
^ae7f28a sum.c  22)         }
^ae7f28a sum.c  23)
^ae7f28a sum.c  24)         return acc;
^ae7f28a sum.c  25) }

hg blame on the contrary will tell me everything comes from common.c. Even when using the rename trick, I cannot get more than the following with hg blame -f -c:

81c4468e59f9    sum.c: #include <math.h>
81c4468e59f9    sum.c:
81c4468e59f9    sum.c: double sum(const double* in, int n)
81c4468e59f9    sum.c: {
81c4468e59f9    sum.c:         int i;
81c4468e59f9    sum.c:         double acc = 0;
81c4468e59f9    sum.c:
81c4468e59f9    sum.c:         for(i = 0; i < n; ++i) {
81c4468e59f9    sum.c:                 acc += in[i];
81c4468e59f9    sum.c:         }
81c4468e59f9    sum.c:
81c4468e59f9    sum.c:         return acc;
81c4468e59f9    sum.c: }
3c1ac7db76ba common.c:
3c1ac7db76ba common.c: double prod(const double* in, int n)
3c1ac7db76ba common.c: {
3c1ac7db76ba common.c:         int i;
3c1ac7db76ba common.c:         double acc = 1;
3c1ac7db76ba common.c:
3c1ac7db76ba common.c:         for(i = 0; i < n; ++i) {
3c1ac7db76ba common.c:                 acc *= in[i];
3c1ac7db76ba common.c:         }
3c1ac7db76ba common.c:
3c1ac7db76ba common.c:         return acc;
3c1ac7db76ba common.c: }

by cournape at May 12, 2009 12:02 PM

Matthieu Brucher

Book review: Game Coding Complete

I got my hand on an old edition of this book second edition, now the third is available), and it seemed to me a good place for game developers to start.
Mike McShaffry has a lot of experience from the game field, and his goal is to share it with the readers. In every chapter, there are some anecdots of his past, and it is a lot of fun to see studios falling in the same pitfalls than we do when we start coding.

Content and opinions

The book is split in four different parts. The first one starts with the fun you can get coding a game, but also the troubles you will have. And what technology will you use? 2D? 3D? And what do they imply? As for every code, there is a set of general good pratices, as memory handling, scripts, … that need to be address. The author sometimes did not use them, and there are examples where they caused troubles.

As to get the game running on a computer, another set of rules is needed. Without them, it is just hard to have a running game in the end. How is the game built (not everybody uses a tool to automaticaly build the game)? How to interact with the game? A lot is written about this last issue, and as the author is used to Direct X, the clues are explained with it. But the advice can be used with other technologies. One just has to find the equivalent functions in the other framework. Obviously, it is not possible for the book to express this in every available framework, and it is also not the purpose of the book (it is not a book on a game engine, not a book on Direct X, …).

The third part is also mainly about Direct X, more exactly the 3D part. It lays down the basis for any 3D game engine, but it is not the book’s goal to be exhaustive about the design of a 3D engine. Also Microsoft imposes a set of rules to get the appropriate “Windows compatible” logo, which is needed if you want to sell the game. The last chapter in this part tacklesdebugging the game. I have to say this is much needed and too often it doesn’t appear in game programming books, although it is one of the pillar of programming. Different debugging techniques are addressed.

Finally, the last part tackles how the coding must be driven. Scheduling and milestones, testing and fixing the bugs or how the game will finally be published (what needs to be done at the end or after the end), all you need if you are in the game industry and you have to handle a commercial release.

Conclusion

My edition of the book is several years old, and I felt it, as different examples are outdated (the requirements needed for a today game, the Direct X version, Windows versions that need to be supported, ..). I couldn’t check in the thirs edition if this was updated, but it should have: the whole point of a new edition is to update these facts. So if you want to code a game, buy the last edition of this book.

Game Coding Complete (Paperback)
by Mike McShaffry
ISBN: 1584506806

Price: USD 37.79
31 used & new available from USD 34.14

| 4.5 | 26


by Matt at May 12, 2009 08:05 AM

May 10, 2009

Gaël Varoquaux

Minimum spanning tree

Gary Ruben came up with the excellent idea of visualizing the minimum spanning tree of a Delaunay tesselation in addition to Delaunay tessalation itself. After he sent me his code, I spent some times playing with it, because I found out that, with the right choice of visualization parameter, it gave me a nice understanding of what a minimum spanning tree was: a tree structure of minimal total length connecting all the vertices of the graphs, and embedded in the graph. On the visualization, the Delaunay graph is displayed in grey, and the minimum spanning tree in thick and colors.

Minimmum spanning tree

The minimum spanning tree is calculated using Prim’s algorithms, on the fullly-connected distance-weighted graph of all points. One can clearly see that is it embedded in the Delaunay graph. In fact I have tested that calculating a minimum spanning tree on the Delaunay graph, or on the complete graph, gave the same result.

The code to create this picture can be found here.

by gael at May 10, 2009 10:52 PM

Ondřej Čertík

My experience with running an opensource project

Nir Aides, the author of the excellent winpdb debugger, sent me the following email on September 21, 2008, so I asked him if I can copy his email and reply in form of a blog post (so that other people can comment and join the discussion) and he agreed. It took me almost a year to reply, but I made it. :)

Hi Ondrej,

How are you?

I am about to publish a new free software project, a new simple PHP framework, and I am interested in your advice.

You started SymPy and were able to make other people join you and develop it with you.
How did you do it?
How did it happen?
Did you actively call for other people or they spontaneously showed interest and joined you?
Are the other major contributor people who were your friends before you started the project?
Did you need to create or manage the project in a particular way to make it attractive to other people?
Are there things you are aware of that promote collaboration or demote it?

I was never successful in doing the same with Winpdb, which while it became reasonably popular, no one has ever joined me to develop it, except for a notable tutorial contribution by Chris Lasher which was developed independently.

Now with the new project, I am wondering what are my chances of making other people try it and take it on. On the one hand it is a new and fresh code base in an interesting field, on the other hand, why would anyone bother to spend their energy on this new project when they have Symfony or Drupal?

What do you think?

BTW, Ohloh believes you have a median of 19,000 lines of changed code per month since the start of their log. Can this be true? Is this humanly possible? According to it SymPy has over 1,000,000 lines of code? I can't understand these numbers. Winpdb has about 25,000 lines after 3 years of development. And from my experience 1,000,000 lines of code projects need about 20-50 full time developers to work on for 2-5 years which is about 40-250 man years. And as if this is not enough you are listed as owner in a dozen other projects in Google code and have enough time to become an awarded scientist. How is this possible?

http://www.ohloh.net/p/sympy/contributors/

BTW2, do you still use Winpdb? If you find yourself using it less, can you say what are the reasons, or what it would take to make it more useful?

BTW3, How is SymPy doing?

Cheers,
Nir



So my most honest answer how to run a successful opensource project is: I don't know.

But nevertheless I tried to summarize some of my ideas and experience and some guidelines that I try to follow, maybe it will be useful to you Nir, or anyone else.

First of all, there has to be a public mailinglist (easily accessible), public bug tracker, nice webpage, easy to find downloads, frequent releases (once a month is good, but in the worst case at least 4 times a year) and a set of guidelines to follow in order to contribute. So that's a must, if the project doesn't have the above, it's almost impossible to become successful. However, that is just a start, just a playground. There are still many projects that have the above and yet they totally fail to attract developers.

So I think the most important principle is that I always think how to employ other people in what I do. If I have some plan in my head how to do something, e.g. how to move some things forward, I always create exact steps and put it to issues, or our mailinglist, so that each step can be done by someone who is completely new to sympy. So I try to look at things from other people's perspective and think -- ok, I quite like this SymPy project and I'd like to get this done (for example a new release, or something fixed, or implemented), but I have no idea how to start and what exactly needs to be done.

So what I try to do if someone comes to our list and asks for something, is that I create a new issue for it and think how I would fix it if I had time. Then write the necessary steps in the issue and invite the submitter to fix it and I offer help with explaining anything and guiding. Now there are two things that can happen. Either the submitter has time and a will to go forward and in this case he starts wrestling with it and whenever he has some code or a question, I need to find time, review it and offer some way out. Or the submitter is too busy, in which case the instructions simply rest in the issues and the next time someone asks for the feature, the instructions are already there. I don't have estimates how frequent either case is.

When I am working on something myself, I try not to code privately, but also put up issues first and put the steps needed in the issues, so that it's easy for other people to join in.

In general, the most precious value for me is the fact that someone else had to sit down at his computer and wrote the patch. So I do everything possible to get new (or more) people interested in the development. Some people think that only super programmers can do a decent job and it's useless to invest time in people that may just have started with Python. They are wrong. Among the SymPy developers (around 65 people total have contributed patches so far at the time of writing this post), we have all kinds of people. We have people from high school, we have a retired US army engineer, we have physicists, mathematicians, biologists, engineers, teachers, or just hobbyists, who do it for fun. Unfortunately, we do not have many women (I think no patch that made it into sympy was contributed by a woman, but I may be wrong), so if anyone has any ideas how to get more women involved, let me know (I know we have several women fans, so that's a good start:). We have people whose first open source project they ever contributed to was sympy and people who are new to Python.

Many times the first patch that a new potential developer submits is not perfect, usually it's faster for me to write it myself, than to help with the first patch, however my rule is to always help the submitter do that. Sometimes he sends a second patch, or a third, and usually it needs less and less work on my side and it already pays off, because he is then able to fix things himself, if he discovers a bug and sympy has just won a one more contributor.

So I came to the conclusion that all that is needed is an enthusiasm. You don't even have to know Python (as you can learn all these things on the way) and you can still do useful things for us and really spare our time.

To answer another question from Nir's email, SymPy has about 130000 lines of code and another about 20000 lines of tests, so I think those stats are wrong. Also the changed lines of code is in my opinion wrong, we usually have about 250 new patches per release (this depends how often we release and other things).

Yes, I am involved in couple other projects, e.g. Debian, Sage, ipython, scipy, hpfem.org (and couple more), basically everything that has to do with numeric simulation and Python, but my activity there varies. The most time consuming thing in the last couple years was definitely school, I was finishing my master in Theoretical Physics in Prague and then moved to the Nevada/Reno and I just finished my first semester here at PhD in Chemical Physics, and sometimes it was just crazy, e.g. I finished teaching at 7pm and instead of going home and sleep, I stayed in my office, fixed 10 sympy issues that were holding off a release, finished at 1am, went home (by bike, since I don't have a car yet), slept couple hours and then did just school again for a week, other people reviewed the issues in the meantime, and then I made the release (instead of sleeping again). In the last semester it was not unusual that I got home at 1am every week day, then slept most of Saturday to catch up, on Sunday I did some laundry and shopping, and the rest of time I did grading and homeworks for all my classes and teaching, no time for anything else (e.g. no friends, no girls, no rest, no hobby, no opensource stuff, nothing). So sometimes one has to work pretty hard to get through it, but fortunately it's behind me finally, if all goes well, I should be just doing research from now on and have a real life too. Also I am sorry I didn't manage to reply sooner. :)

To answer the other questions:
Are the other major contributor people who were your friends before you started the project?
No, not a single major contributor was my friend before I started the project. Every single one of them become a developer using the procedure I described above, e.g. first showed on the list or in the issues, and maybe even the very first patch was not a high quality one (and if I was stupid and arrogant, or didn't see the big potential, I would just ignore them). But when given a chance, they became extremely good developers and sympy would simply just not be here without them.

Did you actively call for other people or they spontaneously showed interest and joined you?
I very much encourage everyone to contribute, but the initial interest must be in them, e.g. they at least have to show around the mailinglist/issues, so that I know about them. But once I know they are interested in some issue, yes, I try to invite them to fix it, with my help.

One observation I made is that I have to always think in the spirit "how to earn new money, not how to spare the money I already have", e.g. when applied to sympy, how to get new developers, how to develop the new great things etc. Even if I am super busy as I was, I still have to think this way. Once I start thinking how to conserve and preserve what we already have, I am done, finished and that's the road to hell.

If I am open, positive, full of energy, I can see people joining me and we can do great things together. It probably sounds obvious, but it was not for me, when for example some people I worked with, started their own projects, when I got busy, and started to compete, instead of helping sympy out. And I felt betrayed, after so much work that I invested into it and started to become protective. And then I realised that's wrong. I can never stop other people do what they want to do. If they want to have their own project, they will have it. If they don't want to help sympy out, they won't (and what is more important, there is nothing wrong with either of that). It's that simple and being protective only makes things worse.

There is also a question of the license that you use for the project, e.g. one should basically only choose between BSD (maybe also MIT or Apache), LGPL and GPL (there are also several versions of the GPL licenses). Unfortunately the fact is, that there are people who will never contribute a code under a permissive BSD license (because it's not protecting their work enough) and there are also other people who really want to code to be BSD (or other permissive license) so they can sell it and they don't need to consult with lawyers what they are or aren't allowed to do and also so that they can combine it with any other code (opensource or not). It also depends if one wants to combine (and distribute) other codes together. So choosing a license is also important. I believe that for sympy BSD is the best and for other projects (like Sage) GPL is the best and one has to decide on a case by case basis. For Winpdb, I would make it BSD too, since you can get more people using it.

To conclude, SymPy is a little more than 2 years old, and it has been a great ride so far and more things are coming, e.g. this summer we have 5 Google Summer of Code students and people are starting it to use in their research and we plan to use it in our codes at our group here in Reno too, so things look promising. I am really glad, we managed to build such a community, so that when I am busy, as I was the last semester, other people help out with patches, reviews and other things, so that the project doesn't stall and when I got rid of my school duties now, we can move things forward a lot.

So maybe you can get inspired by some of the ideas above. I am also interested in any discussion about this (feel free to post a comment below, or send me an email, or just write to a sympy list about what you think).

by Ondřej Čertík (noreply@blogger.com) at May 10, 2009 01:47 PM

May 08, 2009

David Cournapeau

cournape


For quite some time, I wanted to add code coverage to the C part of numpy. The upcoming port to python 3k will make this even more useful, and besides, Stefan Van Der Walt promised me a beer if I could do it.

There are several tools to do code coverage of C code – the most well known is gcov (I obviously discard non-free tools – those tend to be fairly expensive anyway). The problem with gcov is its inability to do code coverage for dynamically loaded code such as python extensions. The solution is thus to build numpy and statically link it into python, which is not totally straightforward.

Statically linking simple extensions

I first looked into simpler extensions: the basic solution is to add the source files of the extensions into Modules/Setup.local in python sources. For example, to build the zlib module statically, you add

*static*
zlib zlibmodule.c -I$(prefix)/include -L$(exec_prefix)/lib -lz

And run make, this will statically link the zlib module to python. One simple way to check whether the extension is indeed statically link is to look into the  __file__ attribute of the extension. In the dynamically loaded case, the __file__ returns the location of the .so, but the attribute does not exist in the static case.

Code coverage

To use gcov, two compilation flags are needed, and one link flag:

gcc -c -fprofile-arcs -ftest-coverage …
gcc … -lgcov

Note that -lgcov must be near the end of the link command (after other libraries flags). To do code coverage of e.g. the zlib module, the following works in Modules/Setup.local:

*static*
zlib zlibmodule.c -I$(prefix)/include -fprofile-arcs -ftest-coverage -L$(exec_prefix)/lib -lz -lgcov

If everything goes right after a make call, you should have two files zlibmodule.gcda and zlibmodule.gcno into your Modules directory. You can now run gcov in Modules to get code coverage:

cd Modules && gcov zlibmodule

Of course, since nothing was run yet, the code coverage is 0. After running the zlib test suite, things are better though:

./python Lib/test/test_zlib.py && gcov -o Modules Modules/zlibmodule

The -o tells gcov where to look for gcov data (the .gcda an .gcno files), and the output is

File ‘./Modules/zlibmodule.c’
Lines executed:74.55% of 448

Build numpy statically

I quickly added a hack to build numpy C code statically instead of dynamically in numscons, static_build branch, available on github. As it is, numpy will not work, some source code modifications are needed to make it work. The modifications reside in the static_link branch on github as well.

Then, to statically build numpy with code coverage:

LINKFLAGSEND=”-lgcov” CFLAGS=”-pg -fprofile-arcs -ftest-coverage” $PYTHON setupscons.py scons –static=1

where $PYTHON refers to the python you build from sources. This will build every extension as a static library. To link them to the python binary, I simply added a fake source file and link the numpy as libraries to the fake source in Modules/Setup.local

*static*
multiarray fake.c -L$LIBPATH -lmultiarray -lnpymath
umath fake.c -L$LIBPATH -lumath -lnpymath
_sort fake.c -L$LIBPATH -l_sort -lnpymath

where LIBPATH refers to the path where to find the static numpy libraries (e.g. build/scons/numpy/core in your numpy source tree). To run the testsuite, one has to make sure to import a numpy where multiarray, umath and _sort extensions have been removed, it will crash otherwise (as the extesions would be present twice in the python process, one for the dynamically loaded code, one for the statically linked code). The test suite kind of run (~1500 tests), and on can get code coverage afterwards. For multiarray extension, here is what I get:

File ‘build/scons/numpy/core/src/multiarray/common.c’
Lines executed:52.56% of 293
build/scons/numpy/core/src/multiarray/common.c:creating ‘common.c.gcov’

File ‘build/scons/numpy/core/include/numpy/npy_math.h’
Lines executed:50.00% of 12
build/scons/numpy/core/include/numpy/npy_math.h:creating ‘npy_math.h.gcov’

File ‘build/scons/numpy/core/src/multiarray/arraytypes.c’
Lines executed:62.23% of 1030
build/scons/numpy/core/src/multiarray/arraytypes.c:creating ‘arraytypes.c.gcov’

File ‘build/scons/numpy/core/src/multiarray/hashdescr.c’
Lines executed:68.38% of 117
build/scons/numpy/core/src/multiarray/hashdescr.c:creating ‘hashdescr.c.gcov’

File ‘build/scons/numpy/core/src/multiarray/numpyos.c’
Lines executed:81.48% of 189
build/scons/numpy/core/src/multiarray/numpyos.c:creating ‘numpyos.c.gcov’

File ‘build/scons/numpy/core/src/multiarray/scalarapi.c’
Lines executed:47.43% of 350
build/scons/numpy/core/src/multiarray/scalarapi.c:creating ’scalarapi.c.gcov’

File ‘build/scons/numpy/core/src/multiarray/descriptor.c’
Lines executed:61.96% of 1028
build/scons/numpy/core/src/multiarray/descriptor.c:creating ‘descriptor.c.gcov’

File ‘build/scons/numpy/core/src/multiarray/flagsobject.c’
Lines executed:42.31% of 208
build/scons/numpy/core/src/multiarray/flagsobject.c:creating ‘flagsobject.c.gcov’

File ‘build/scons/numpy/core/src/multiarray/ctors.c’
Lines executed:64.69% of 1583
build/scons/numpy/core/src/multiarray/ctors.c:creating ‘ctors.c.gcov’

File ‘build/scons/numpy/core/src/multiarray/iterators.c’
Lines executed:70.41% of 774
build/scons/numpy/core/src/multiarray/iterators.c:creating ‘iterators.c.gcov’

File ‘build/scons/numpy/core/src/multiarray/mapping.c’
Lines executed:77.95% of 721
build/scons/numpy/core/src/multiarray/mapping.c:creating ‘mapping.c.gcov’

File ‘build/scons/numpy/core/src/multiarray/number.c’
Lines executed:51.80% of 361
build/scons/numpy/core/src/multiarray/number.c:creating ‘number.c.gcov’

File ‘build/scons/numpy/core/src/multiarray/getset.c’
Lines executed:44.09% of 372
build/scons/numpy/core/src/multiarray/getset.c:creating ‘getset.c.gcov’

File ‘build/scons/numpy/core/src/multiarray/sequence.c’
Lines executed:50.00% of 60
build/scons/numpy/core/src/multiarray/sequence.c:creating ’sequence.c.gcov’

File ‘build/scons/numpy/core/src/multiarray/methods.c’
Lines executed:47.35% of 942
build/scons/numpy/core/src/multiarray/methods.c:creating ‘methods.c.gcov’

File ‘build/scons/numpy/core/src/multiarray/convert_datatype.c’
Lines executed:56.11% of 442
build/scons/numpy/core/src/multiarray/convert_datatype.c:creating ‘convert_datatype.c.gcov’

File ‘build/scons/numpy/core/src/multiarray/convert.c’
Lines executed:66.67% of 183
build/scons/numpy/core/src/multiarray/convert.c:creating ‘convert.c.gcov’

File ‘build/scons/numpy/core/src/multiarray/shape.c’
Lines executed:76.81% of 345
build/scons/numpy/core/src/multiarray/shape.c:creating ’shape.c.gcov’

File ‘build/scons/numpy/core/src/multiarray/item_selection.c’
Lines executed:55.07% of 937
build/scons/numpy/core/src/multiarray/item_selection.c:creating ‘item_selection.c.gcov’

File ‘build/scons/numpy/core/src/multiarray/calculation.c’
Lines executed:59.08% of 523
build/scons/numpy/core/src/multiarray/calculation.c:creating ‘calculation.c.gcov’

File ‘build/scons/numpy/core/src/multiarray/usertypes.c’
Lines executed:0.00% of 111
build/scons/numpy/core/src/multiarray/usertypes.c:creating ‘usertypes.c.gcov’

File ‘build/scons/numpy/core/src/multiarray/refcount.c’
Lines executed:66.67% of 129
build/scons/numpy/core/src/multiarray/refcount.c:creating ‘refcount.c.gcov’

File ‘build/scons/numpy/core/src/multiarray/conversion_utils.c’
Lines executed:59.49% of 316
build/scons/numpy/core/src/multiarray/conversion_utils.c:creating ‘conversion_utils.c.gcov’

File ‘build/scons/numpy/core/src/multiarray/buffer.c’
Lines executed:56.00% of 25
build/scons/numpy/core/src/multiarray/buffer.c:creating ‘buffer.c.gcov’

File ‘build/scons/numpy/core/src/multiarray/scalartypes.c’
Lines executed:42.42% of 877
build/scons/numpy/core/src/multiarray/scalartypes.c:creating ’scalartypes.c.gcov’

File ‘build/scons/numpy/core/src/multiarray/ucsnarrow.c’
Lines executed:89.36% of 47
build/scons/numpy/core/src/multiarray/ucsnarrow.c:creating ‘ucsnarrow.c.gcov’

File ‘build/scons/numpy/core/src/multiarray/arrayobject.c’
Lines executed:58.75% of 514
build/scons/numpy/core/src/multiarray/arrayobject.c:creating ‘arrayobject.c.gcov’

File ‘build/scons/numpy/core/src/multiarray/multiarraymodule.c’
Lines executed:49.12% of 1134
build/scons/numpy/core/src/multiarray/multiarraymodule.c:creating ‘multiarraymodule.c.gcov’

The figures themselves are not that meaningful ATM, since the test suite does not run completely, and the built numpy is a quite bastardized version of the real numpy.

The numpy modifications, although small, are very hackish – I just wanted to see if that could work at all. If time permits, I hope to be able to automate most of this, and have a system where it can be integrated in the trunk. I am still not sure about the best way to build the extensions themselves. I can see other solutions, such as producing a single file per extension, with every internal numpy header/source integrated, so that they could be easily build from Setup.local. Or maybe a patch to the python sources so that make in python sources would automatically build numpy.

by cournape at May 08, 2009 05:04 AM

May 06, 2009

Titus Brown

Python in the humanities?

I'm writing some proposals to expand support for Python infrastructure (think cross-platform build and test farms a la Snakebite) and for the Mellon Foundation application, I'd like to find out how Python is being used in the humanities. I found NLTK, the Natural Language Toolkit; what else is big?

thanks, --titus

May 06, 2009 06:41 PM

May 05, 2009

Matthieu Brucher

Book review: Refactoring to patterns

After last week book review on Martin Fowler’s Refactoring, I’d like to review another book, more oriented towards patterns and refactoring.

Content and opinions

First, this book could be seen as a follow up of Refactoring, as almost all the described processes use steps from it. The first chapters explain what to expect from refactoring (how it affect software architecture), patterns and how to detect code that needs refactoring. The catalog is split like the original Design Patterns book in different categories (Creation, Simplification, Generaliation, Protection, Accumulation or Utilities), and each of them is described as in the GoF book (which suits me quite well).

The code language is Java, so it may be sometimes difficult to find the equivalent construction in your favorite language, but it is doable if you known Java basics.

Conclusion

This book quotes several times Martin Fowler, but this is not related to the fact that this book is in the “Fowler signature” collection of the publisher. Refactoring to Patterns addresses what Refactoring couldn’t and goes firther, without imposing, just by suggesting.

Refactoring to Patterns (Addison-Wesley Signature Series) (Hardcover)
by Joshua Kerievsky
ISBN: 0321213351

Price: USD 49.63
52 used & new available from USD 34.98

| 4 | 44

by Matt at May 05, 2009 08:04 AM

May 01, 2009

Gaël Varoquaux

Extracting the data from the Delaunay triangulation

Gary Ruben just asked me if it was possible to retrieve the triangulation information from my previous Delaunay example. Actually the reason I came up with this example is that Emanuelle Gouillart, my partner[*], needed to do Delaunay triangulation on some data. She was kind enough to extract that code from her code base. Here it is.

[*] The various languages do not seem to have evolved quickly enough to cope with the fact that people can now have a stable long-term relationship with someone you are not married to. What word should I be using here: ‘girlfriend’, ‘partner’… ?

by gael at May 01, 2009 03:42 PM

April 30, 2009

Enthought

EPD 4.2.30201 released

I am pleased to announce that EPD (Enthought Python Distribution) version
4.2.30201 has been released.  You may find more information about EPD, as
well as download a 30 day free trial here:

http://www.enthought.com/products/epd.php

You can check out the release notes here:

https://svn.enthought.com/epd/wiki/Py25/4.2.30201/RelNotes

The Enthought Python Distribution (EPD) is a “kitchen-sink-included”
distribution of the Python Programming Language, including over 80
additional tools and libraries. The EPD bundle includes NumPy, SciPy,
IPython, 2D and 3D visualization, database adapters, and a lot of
other tools right out of the box.

http://www.enthought.com/products/epdlibraries.php

It is currently available as a single-click installer for Windows XP (x86),
Mac OS X (a universal binary for OS X 10.4 and above),
RedHat 3, 4 and 5 (x86 and amd64), as well as Solaris 10 (x86).

EPD is free for academic use.  An annual subscription including installation
support is available for individual and commercial use.  Additional
support options, including customization, bug fixes and training classes
are also available:

http://www.enthought.com/products/support_level_table.php

- Ilan

by Ilan Schnell at April 30, 2009 08:16 PM

April 28, 2009

Matthieu Brucher

Book review: Refactoring: Improving the design of existing code

I’ve read this book when I started my PhD thesis. It helped me laying down the basis of software conception.

It was the first book where I found the code smell concept. And my former code really smelt…

Content and opinions

This book became a reference for me. The patterns catalog (because I consider them as patterns) seems almost exhaustive for me. Some of the steps described are too easy (as inlining a function). It may be more difficult at first to extract a method from another, but it is still the basis of refactoring.

The main use of the book isn’t saying what one already know (inling, splitting a function, …), it is showing a new step, one one didn’t think about and that solves the problem at hand. Stating the problem, the smell, is also a main focus of the book, where Martin Fowler gives hints to distinguish between the different smells.

The author is one of the world-known unitary tests herald, so it is no wonder they have a central place in the different patterns. This way, the refactoring cannot change the code behavior. The given examples are also clear and simple enough, completely described step by step (mainly as the book goes forward, new patterns use preceding ones).

Finally, the code is written in Java, but it isn’t a problem for someone used to object-oriented language. Useful tools are also exposed, but they are old, dating from the time when the book was written (and they are mainly geared toward Java).

Conclusion

This was a really simple book. Since I closed it, I didn’t have to open it again, as the patterns are pretty simple. But still, as for every pattern, you first have to understand it (and read it) to acknowledge it. And I tend to refactor regularly following the ideas behind this book.

Refactoring: Improving the Design of Existing Code (Addison-Wesley Object Technology Series) (Hardcover)
by Martin Fowler, Kent Beck, John Brant, William Opdyke, Don Roberts
ISBN: 0201485672

Price: USD 46.79
85 used & new available from USD 30.40

| 4.5 | 138

by Matt at April 28, 2009 08:15 AM

Enthought

hash() differences for 32bit and 64bit systems

I was working on a client/server project where we send collections of data across the wire. I needed a method of matching datasets on the client and server, and the python hash function seemed ideal. I suspected that the hash function might have different behaviour on different systems, but conveniently forgot to test it until after I tried to deploy it.

I expected differences, but I didn’t really know to what extent, so I did a little research. So far, ints are the only thing I have found that hash the same, because int’s __hash__ function just returns the int value. Otherwise, Python’s hash functions depend on multiplication using long ints.

While doing my research, I found a page discussing hashing in Python 2.3. The algorithms are similar to the C implementations in Python 2.6.

Of course, I got bit because Python 2.5 on OS X 10.4 and 64bit RedHat 5 didn’t hash my objects the same. In the end, I serialized the data’s metadata and performed a md5 instead, which requres more CPU cycles, but at least it works…

by Bryce Hendrix at April 28, 2009 05:19 AM

April 27, 2009

Gaël Varoquaux

Mayavi image of the … month

Tonight I sat down and played a bit with VTK’s Delaunay tessalation filter. I wanted to inspect the local structure of a graph created by Delaunay tessalation of random points. To see better the structure, I selected a slab of the resulting unstructured grid. I think the image is not only instructive to explain what a Delaunay tessalation is, it also looks pretty cool. Here is the image and the Mayavi script that creates it.

Delaunay interpolation

by gael at April 27, 2009 09:42 PM

Enthought

Try out the Beta2 of the next version of EPD

I missed a date with my wife on Friday to help push the beta release of EPD out for all 10 platforms we are currently supporting (WinXP, WinVista, Mac OS X 10.5(10.4)-intel(ppc), RH3 (x86, amd64), RH5 (x86, amd64)). The 6 different binaries were uploaded to our download servers early Saturday morning (4:00 am Central Time). I’m excited for people to try the new release as it brings together recent NumPy, SciPy, matplotlib, and Ipython together with many additional tools.

One of the things I’m very enthused to have people try is an alpha version of EPDLab which comes in the distribution. EPDLab is an open-source Envisage application which offers an IPython shell along with a linked code editor to allow highly interactive development. EPDLab also contains a “search documentation strings” widget which uses Whoosh and some Robert Kern indexing Fu to provide a very useful search for all of the powerful tools pre-packaged with EPD.

Get the beta2 today and start using a very full-featured distribution of Python across your organization today. Download Beta2.

If you try this recent beta, I’d love to hear from you about any feedback you may have (both positive and negative). Email me at info@enthought.com. The final version of the next release of EPD (4.2.30201) should be out by early next week.

-Travis

by Travis Oliphant at April 27, 2009 07:34 PM

Titus Brown

TALK: Open Source at Microsoft: The Past, Present, and Future

I'd like to invite you to attend the last of the Michigan State University CSE colloquia for the 2008-2009 academic year: jointly sponsored as an AT&T Visiting Lecturer by the MSU LCT, and the CSE department, Sam Ramji will speak about

Open Source at Microsoft: The Past, Present and Future

in CommArts room 147, Friday May 1, at 11:00am. I encourage you all to attend and to forward this on to others who might be interested! As you know, open source software is playing an increasingly big part in education, academia, science, and business, and so I expect this to be a very interesting talk.

Contact me at ctb@msu.edu for further information.

--

Abstract:

Since Microsoft established its Open Source Lab in Redmond more than five years ago, it has worked with many open source players to make Windows the best platform for all applications to run on. But this has not been without its challenges and there is a lot more work to be done on this front. This talk will cover the thinking behind Microsoft's current open source strategy and what this means for the software engineers of the future. It will also spotlight some innovative Open Source projects the company is supporting at universities across the world.

Biography:

Sam Ramji is the Senior Director of Platform Strategy leading Microsoft's platform strategy efforts across the company, including long-term strategic planning in the Windows Server and Tools organization. Sam's primary focus is to drive Microsoft's Linux and Open Source Strategy, working together with Microsoft technology development teams and open source communities to build interoperable solutions.

Prior to his current role at Microsoft, Sam was a Director of Emerging Business working on the Silicon Valley Campus where he managed relationships with Venture Capitalists and entrepreneurs. Prior to joining Microsoft, Sam led technical product strategy at BEA Systems, engineering teams building large-scale applications on Open Source software (at Ofoto.com) as well as hands-on development of client, client-server, and distributed applications on Unix, Windows, and Macintosh at prior companies.

Sam holds a Bachelor of Science degree in Cognitive Science from the University of California at San Diego, and is a member of the Institute for Generative Leadership.

April 27, 2009 07:18 PM

April 23, 2009

Titus Brown

Open Source is like a mistress

Open source coding is like a not-so-demanding mistress: I work on it at night, surreptitiously, after my wife and daughter are asleep. twill and figleaf are like bastard children, who only get attention when I can spare it from my "real" family (my teaching, research or my actual family, depending ;)

Sigh.

--titus

April 23, 2009 05:46 AM

April 21, 2009

Matthieu Brucher

Book review: Head First Design Patterns

If last week’s book review was too complicated for you, perhaps this book is more suited for you. Less design patterns, but a funnier way to describe them.

Content and opinions

Only twelve patterns are explained, but more important is the fact that each of them is detailled, with examples and exercices, as well as important phrases displayed as images. With humor, these are the caracteristics of the “Head First” collection.

Instead of using the same example throughout the book, each chapter has more or less its own concrete example. The latest chapters explain how they can interact to create complex applications, or how they can define new design patterns (as MVC for instance), then how it will change your way of thinking software architecture.

Conclusion

This book focuses on the most used and perhaps useful design patterns. This way, it can present them differently than a simple catalog, but I have to say that this approach will not be suited for everyone. For instance, I appreciate books that go straight to the point.

This kind of book is well suited for people that want to start with design patterns, but not for people familiar with them: they should go for catalogs.

Head First Design Patterns (Paperback)
by Elisabeth Freeman, Eric Freeman, Bert Bates, Kathy Sierra
ISBN: 0596007124

Price: USD 29.67
77 used & new available from USD 19.94

| 4.5 | 273

by Matt at April 21, 2009 08:13 AM

April 20, 2009

Titus Brown

What is disco?

Anyone out there used disco (http://discoproject.org/)? Comments, good/bad/neutral?

From the page:

Disco is an open-source implementation of the Map-Reduce framework for distributed computing. As the original framework, Disco supports parallel computations over large data sets on unreliable cluster of computers.

The Disco core is written in Erlang, a functional language that is designed for building robust fault-tolerant distributed applications. Users of Disco typically write jobs in Python, which makes it possible to express even complex algorithms or data processing tasks often only in tens of lines of code. This means that you can quickly write scripts to process massive amounts of data.

thanks!

--titus

April 20, 2009 07:43 PM

April 14, 2009

Matthieu Brucher

Book review: Design patterns: Elements of Reusable Object-Oriented Software

As I’ve said before, I’ve done several book reviews in the past. I will start with a small serie on design patterns books.

This book is one of the “must-have” in your library. If you write some code or if you manage some IT or Computer Science projects, you will have this book to lay down the basic software architecture.

Content and opinions

The first two chapters are an introduction and explain the reasons of the existence of design patterns, how they should be used, good and bad pratices, … Design patterns without rules to apply them are useless (as the original architectural patterns are useless without drawings skills). A practical example is the object of the second chapter.

Design patterns are exposed in a three parts catalog. Each pattern each described by a complete explanation, an UML diagram, the interactions between the pattern elements, as well as some implementation solutions (all solutions cannot be written, as it is language-dependent).
Creational patterns are about creating new objects. They include the abstract factory (constructing several objects of different kinds), the builder (a more elaborate constructor), the factory method (overloading a class method to create objects based on different classes), the prototype (creating new objects by cloning an instance) and the singleton (creating only one instance of a given class).
Structural patterns are more about the actual software architecture. They include the adapter (translating an interface to another one), the bridge (separating an interface from different implementations), the composite (allowing several objects of a hierarchy to be composed together), the decorator (adding characteristics to an object), the facade (offering an interface to several classes), the flyweight (sharing the same objects between instances so as to reduce memory overhead) or the proxy (using another object to access another, potentially hidden, one).
Behavioral patterns enable a software to change its own behavior. They include the chain of responsibility (allowing processing requests by whoever can), the command (creating complex requests), the interpreter (describing how a language can be processed), the iterator (providing a way of browing the content of a data container), the mediator (allowing communication between different classes), the memento (enabling restoring the state of an object), the observer (sometimes also called listener, creating a way for instances to be updated/called by another one), the state (allowing changing behavior on the fly), the strategy (providing several ways of doing something), the template method (providing a skeletton for an algorithm) or the visitor (allowing execution of code for every the content of an object). This patterns set is perhaps the most heterogeneous one (although the state and the strategy are in fact exactly the same, the only difference being the interpretation of their actions).

Conclusion

This book, sometimes refered ad the GoF book, lays down the basis of software conception. These 23 patterns are not the only ones you may use (some of them are also seldom used), but they are used to make the more complicated ones. If you have one design patterns book you should buy, it is this one.

Design Patterns: Elements of Reusable Object-Oriented Software (Addison-Wesley Professional Computing Series) (Hardcover)
by Erich Gamma, Richard Helm, Ralph Johnson, John M. Vlissides
ISBN: 0201633612

Price: USD 38.96
124 used & new available from USD 23.06

| 4.5 | 261

by Matt at April 14, 2009 08:09 AM

April 12, 2009

Titus Brown

Twill lives!

One of the advantages of this year's PyCon was that it was (again) held in Chicago, the home town of Leapfrog Online. Since they use twill quite a bit, and were bothered by some of the poor design decisions and bugginess, they were keen to get together with me to move twill forward. So we scheduled a sprint for the Monday after PyCon.

In preparation for the sprint, I did a bit of research into how widely twill was being used. Downloads only roughly correlate, but I was surprised to discover that in just the last year, there were over 6,000 downloads from my site; this doesn't count Debian users, who can install it from one of the Debian dists. I'd also been surprised by the number of people at PyCon who came up to me and told me that they were using twill internally in their companies -- at least two very expert groups had settled on it for some of their internal monitoring and testing. Very cool! What this told me is that twill is very nice, simple and usable for many people and we shouldn't get too adventuresome; good thing to know ;).

The sprint basically consisted of us talking through a few fundamental issues like bundling and future development, then fixing a few items, while I forwarded on all of the bug reports I've gotten over the last two years.

The source code has now moved to code.google.com/p/twill and you can see all of the issues in the usual place.

During the sprint we made a few decisions:

  • 0.9.2 is Coming Real Soon, as a largely feature/API-stable release that fixes a number of simple bugs and integrates the latest mechanize.
  • for 0.9.2 and 1.0 we will provide both bundled and unbundled versions of twill; the bundled versions will contain BeautifulSoup, mechanize, ClientForm, and pyparsing. The unbundled version will simply specify what versions of those packages it needs. This unbundling will help packagers out while letting individuals (like, say, Windows users) install twill easily.
  • 1.0 is further down the road, but will only add a few features. The main goal of 1.0 is to be nice & stable.
  • 2.0 and beyond is on the table but exactly what it will be is unclear. I have my own ideas but since I'm not doing much Web developing I may let others take over.

Since the sprint, Pam Z. finished putting the issues into the tracker and we've been slowly trying to work through them.

Props to Pam Z., Nat W., Kevin B., and Jesse for coming to the sprint, and to Terry Peppers and Leapfrog for pushing it! And thanks to Leapfrog for an excellent steak dinner afterwards ;)

--titus

April 12, 2009 02:17 PM

April 09, 2009

Enthought

We may drop OS X 10.4 and PPC support from future EPD releases

We’ve had a number of recent internal discussions about EPD during which the phrases “that won’t work on OS X 10.4″ or “does upstream have PPC support?” came up quite often.  For example, a recent discussion about the importance of relocatable EPD egg installs sputtered because we realized Mac OS X 10.4 doesn’t support RPATH settings in binary headers, which meant we’d have to do something special just for that platform.

Once we realized this commonality, we next wondered how important OS X 10.4 and PPC support actually is for the EPD user community.  Thus the point of this blog post: to get some community input.  This is your chance to speak up if you need OS X 10.4 and/or PPC support.  I can’t promise that a single ‘yes’ will sway our decision making, but certainly the more people who speak up, the more likely we are to try to continue the support.

by Dave Peterson at April 09, 2009 06:00 PM

Gaël Varoquaux

Long sys.path and consequences, one more reason not to use easy_install

For those who don’t know, sys.path is the path that the Python interpreter traverse at each module import to look for the module file imported.

This blog post is about the consequences of having a long sys.path. I’ll try and make it short, but I would have a lot to say. I am just reacting on Noah Gift’s post on performance improvement, not making a full essay on why overloading sys.path is considered harmful.

When using easy_install (or setuptools), each new project is installed in a different directory, and the directory is added at runtime to the sys.path (the addition at runtime confuses many users who are not aware of it). As a result, you quickly end up with more than 40 directory on your sys.path. These directories are ’stat-ed’ one after the other on each module import. Thus if you have a long sys.path, there are a large amount of system calls to read directories. To check this out, simply try:

strace python -c "import foobar" 2>&1 | less

You can see the amount of noise created by a simple (failing) import statement. On a system with high latency (such as an NFS, as we use at work), this is very costly.

Noah joyfully reports performance improvements by hijacking the Python import mechanism. I claim that part of what Noah has done is not really hijacking the import mechanism, it is undoing the hijacking performed by setuptools.

I know I am being rude, but many people raised this point before, and it is not getting any traction from the setuptools maintainer. I claim that you should not be using setuptools or easy_install if you want performance or control. I claim that you should not be using setuptools unless you understand well what you are doing (which defeats the name easy_install).

The way I install packages when I want good control via easy_install is in a virtual environment to discovered the dependencies, and then:

easy_install -Zeab . package_name

to download the package for each required package, and

python setup.py install --single-version-externally-managed –record ./foobar

if the package itself is using setuptools.

As you can see, setuptools make it really hard to do a clean install. Its a design choice :(.

Another alternative is to use pip which I strongly encourage.

by gael at April 09, 2009 07:43 AM

April 08, 2009

Enthought

Intro to Scientific Computing in Python, June 15-19, Austin TX

Enthought is offering “Introduction to Scientific Computing in Python” at our offices in Austin, Texas from June 15th to June 19th. This course is intended for scientists and engineers who want to learn to use Python for day-to-day computational tasks.

  • Day 1: Introduction to the Python Language
  • Day 2: Array Calculations with NumPy
  • Day 3: Numeric Algorithms with SciPy
  • Day 4: Interfacing Python with Other Languages
  • Day 5: Interactive 2D Visualization with Chaco

The cost for the course is $2500. Please see the course description on the Enthought website for details.

Space is still available in our course on Python for Science, Engineering, and Financial Analysis, May 18th to 21st, in New York City

by Janet Swisher at April 08, 2009 11:21 PM

April 07, 2009

Matthieu Brucher

Profiling with Valgrind

Profiling comes in three different flaviors. The first is emulation, where a processor behavior is emulated, the second is sampling, where at regular intervals, the profiler samples the status of a program, and fianlly instrulentation, where the profiler gets information when a subroutine is called and when it returns. As with the Heisenberg uncertainty, profiling changes the exact behavior of your program. This is something you have to remember when analyzing a profile.

Valgrind is an Open Source emulation profiler. It is freely available on standard Linux platforms. As it is an emulation, it is far slower than the actual program. This means that the I/O are underestimated. The advantage is that you can have every detail on the memory behavior (cache misses for instance). Valgrind does not emulate all processors, but you can tweak it to approach your own one.

This is more or less a translation of my French tutorial on Valgrind profiling.

Calling Valgrind

Calling the profiler is really easy:

valgrind --tool=callgrind --dump-instr=yes --simulate-cache=yes --collect-jumps=yes program arguments

Here, I ask valgrind to use the callgrind profiler plugin, and it is supposed to dump the executed instructions (which will help knowing which part of a function really costs, not only which function), simulate the cache (to help enhancing the processor usage) and collect jumps (to have a dynamic view of the program behavior). Of course, the program must have been compiled with the appropriate compilation options (at least -g).

Analyzing profiles

KCacheGrind is probably the best tool to visualize ad analyze valgrind results (it can also display other profilers results).

When opening a profile, KCacheGrind may not recognize the associated source files. You may add their folder to the annotations folders.

Add source folders for better annotations

I think the most important graph KCacheGrind provides is the Callee Map. It can be colorized by different means (files, classes, …), the main point being that Callee Map provides an image where the surface of a function represents its weight in the program execution (weight being number of instructions, cache misses, …). Unfortunately it appears that in some cases, KCacheGrind is not able to create everything Callee related. I don’t know why, but I got this on a RedHat 4, the associate KCacheGrind and the latest valgrind.

Callee Map colorized by class Callee Map colorized by source file

Call graphs can also provide intel on how much each function consumes. When double-clicking on a function (in the call graph, in the Callee Map), it is “activated”. The original source code is shown (with jumps, if they were collected) with the cost for each instruction, which functions called the function and which functions are called. Another important thing is the difference between the self cost (sometimes called exclusive cost) and the inclusive cost. The former is the cost of the function alone, the latter is the cost of the function with the cost of the called functions.

Source and assembler with displayed costs and jumps Different proposed costs

Conclusion

Valgrind combined with KCacheGrind are free tools to make an application profile. It is far from perfect, but it provides valuable information. Instrumentation- and sample-based profiles need a patched kernel (for Linux) or administrator rights (for Windows and Linux), and they can’t provide at the moment every cost, contrary to emulation.

by Matt at April 07, 2009 08:29 AM

April 05, 2009

Official SymPy blog

SymPy 0.6.4 released

SymPy 0.6.4 has been released on April 4, 2009. It is available at

http://code.google.com/p/sympy/

About SymPy

SymPy is a Python library for symbolic mathematics. It aims to become a full-featured computer algebra system (CAS) while keeping the code as simple as possible in order to be comprehensible and easily extensible. SymPy is written entirely in Python and does not require any external libraries.

Major changes in this release:

  • robust and fast (still pure Python) multivariate factorization
  • sympy works with pickle protocol 2 (thus works in ipython parallel)
  • ./sympy test now uses our testing suite and it tests both regular tests and doctests
  • examples directory tidied up
  • more trigonometric simplifications
  • polynomial roots finding and factoring vastly improved
  • mpmath updated
  • many bugfixes (more than 200 patches since the last release)

The following 21 people have contributed patches to this release (sorted by the number of patches):

  • Ondrej Certik
  • Mateusz Paprocki
  • Fabian Seoane
  • Andy R. Terrel
  • Freddie Witherden
  • Robert Kern
  • Priit Laes
  • Riccardo Gori
  • Fredrik Johansson
  • Aaron Meurer
  • Alan Bromborsky
  • Brian E. Granger
  • Felix Kaiser
  • Kirill Smelkov
  • Vinzent Steinberg
  • Akshay Srinivasan
  • Andrew Docherty
  • Andrew Straw
  • Henrik Johansson
  • Kaifeng Zhu
  • Ted Horst

The following people helped review patches:

  • Riccardo Gori
  • Fabian Seoane
  • Vinzent Steinberg
  • Gael Varoquaux
  • Fredrik Johansson
  • Robert Kern
  • Alan Bromborsky
  • Ondrej Certik


There were 218 new patches since 0.6.3:

$ git log --pretty=oneline sympy-0.6.3..sympy-0.6.4 | wc -l
218

Plans for the future:

Our roadmap: http://wiki.sympy.org/wiki/Plan_for_SymPy_1.0

by Ondřej Čertík (noreply@blogger.com) at April 05, 2009 02:22 AM