by Sam's Place (noreply@blogger.com) at July 04, 2009 03:02 AM
by Sam's Place (noreply@blogger.com) at July 04, 2009 03:02 AM
In March, I’ve set up a Redmine application with the Ruby webserver Webrick. Since then, I’ve migrated to Apache, and then the question was: Which Ruby bridge module to use? It’s not that the choice is large, you have mod_fastcgi, mod_fcgid and mod_rails a.k.a. Passenger. I’ve tried the three of them, and only one was a success.
As for the last post about Redmine, I’ve compiled everything (Apache included) in a custom location and I start the server from there (without root rights).
This is an old module, but it should be easy to use.
After setting up the configuration file to get the public/dispatch.fcgi file, Ive started the server (did I say that Ruby does a wonderful job by providing every wrapper file needed for any webserver? It provides cgi and fcgi skeletton files, and of course everything needed for Webrick). Unfortunately, it didn’t work, it misses the rubygem module, although it is installed as I’ve used it to install Ruby on Rails!
So I left mod_fastcgi where it was, in the dust.
This module is based on Fast CGI as well, but is more recent than mod_fastcgi. It uses Unix socks for the communications between Apache and the back-end, and in my configuration, the server didn’t seem able to create socks (I didn’t find a reason on the Internet, but I have to say I didn’t look far, and besides Google has not much to say about this issue. And I still have Passenger). So much for this module.
This module is supposed to interact directly with the ruby files, and in fact it does. No need to have a dispatch.fcgi file, it calls the server as Webrick does.
I’ve installed rake with a gem, but Passenger setup didn’t find it, I had to add a symbolic link between lib/ruby/gems/1.8/bin/rake and bin/rake. I’ve installed fastthread as well, and I was good to go.
In the configuration file, I load the appropriate modules:
LoadModule passenger_module /web/src/passenger-2.2.2/ext/apache2/mod_passenger.so PassengerRoot /web/src/passenger-2.2.2 PassengerRuby /web/bin/ruby |
Then I can configure the Virtual Host:
NameVirtualHost *:3000 <VirtualHost *:3000> DocumentRoot /src_custom/redmine-0.8.3/public <Directory /src_custom/redmine-0.8.3/public> AllowOverride None Order allow,deny allow from all Options Indexes FollowSymLinks MultiViews </directory> ErrorLog /web/logs/error.log LogLevel warn CustomLog /web/logs/access.log combined ServerSignature On </VirtualHost> |
With Passenger, it was really easy to have a working configuration. The only thing I’m still missing is the ability to set the RAILS_ENV variable to select a different environment than the default (production).
Philippe Baucour from the ‘Unniversité de Franche Comté’ sent me an email saying that he was looking for the rare PhD candidate that would be able to do numerical modeling and material science on top of high-quality Python coding. I can sympathise with this quest: it is very hard to find someone who codes well, and if you want on top of that him to be able to do numerical modeling!
If you are not afraid of French, his PhD proposal is below. Contact him for more information. Please don’t contact me, I am drowning under (very interesting) e-mail.
Proposition d’allocation de thèse
Thème 1.g : Modélisation, simulation et calcul haute performance
Thème 2.a : Energie, procédés, impacts environnementaux, stockage de l’énergie
Responsables au sein du département ENISYS-FEMTO-ST équipe Modélisation
D. Hissel, M.C. Péra, R. Glises, Ph. Baucour.
Les phénomènes qui prennent place au sein d’un stack de type PEMFC sont de nature multi-physiques et multi-échelles. Ainsi le comportement d’un stack complet ne peut être appréhendé dans sa globalité que s’il on intègre des domaines tout aussi différents que :
L’ensemble de ces disciplines interagissent à des niveaux d’échelles complètement différents : du dépôt catalytique (i.e. ~um) au stack (i.e. ~m) soit un facteur d’échelle d’environ 106. De plus, les constantes de temps des différents phénomènes sont elles aussi très différentes et rajoutent à la complexité du problème.
Il y a énormément d’études portant sur la modélisation des piles à combustibles mais les difficultés énoncées ci-dessus amènent à faire des restrictions soit sur le domaine d’étude (une cellule), la géométrie (1D ou 2D rarement 3D) ou la représentation des phénomènes (modélisation système). De plus, la puissance de calcul nécessaire pour ce type de problème fortement couplé et non-linéaire n’est pas facilement accessible.
Le travail envisagé consiste à développer une modélisation 3D complète d’un stack à toutes les échelles à la fois de temps et d’espace. L’approche envisagée consiste à utiliser un modèle fractal qui puisse se partitionner et s’adapter à l’ensemble des échelles (temps et espaces) présentes dans un stack. La conception d’un code modulaire permettrait à terme de tester certaines hypothèses sur le fonctionnement des PEMFC. On peut citer par exemple :
Le laboratoire (ENISYS) dispose depuis peu d’un cluster de calcul qui permet d’envisager un modèle complet. Il est composé de 8 noeuds de calcul comportant un total de 64 processeurs pour 64 Go de mémoire et un espace disque de 1 To.
L’objectif de la thèse serait de développer un code parallèle qui permettrait de distribuer sur les 64 coeurs un modèle complet. Ce modèle peut s’envisager comme l’agrégation de modèles à différentes échelles :
Ces modèles relativement simples individuellement seront regroupés afin de former un modèle complet. La difficulté consiste à agréger les différents calculs à la fois en terme de temps et d’espaces, on parle alors de spatial computing ou de parallel computing si l’on distribue un problème complexe sur plusieurs processeurs. Dans le cas de la modélisation d’un stack PEMFC, le spatial computing est envisageable pour les différents domaines d’espaces mais il faudra recourir au parallel computing pour combiner l’ensemble des modèles et s’assurer de la convergence.
Cahier des charges de l’étude :
• Définition du stack étudié en se calquant sur les données expérimentales disponibles.
• Développement des codes de calcul en s’assurant de la compatibilité avec un fonctionnement dans un cluster.
• Développement d’un modèle maître faisant la collecte des différents modèles.
• Définition du partitionnement spatial et temporel.
• Validation sur des données expérimentales disponibles au laboratoire.
Matériel et logiciel envisagé :
Contact:
Dpt-ENISYS
Energie, Ingénierie des Systèmes
multiphysiques
Daniel Hissel
TechnHom, 90010 Belfort Cedex, FRANCE
Phone : 33 (0) 3 84 58 36 21
Fax : 33 (0) 3 84 22 27 22
@ : danieL.hissel@univ-fcomte.fr
Franche-Comté Electronique Mécanique Thermique et Optique - Sciences et Technologies
UMR CNRS 6174
Contact : Monsieur Daniel Hissel
Chef d'équipe Modélisation
I recently sent on a mailing list a few thoughts object-oriented design, so I might as well also be ridiculous on my blog.
I find that in object oriented design, there are two kinds of objects:
| Yes | Neutral | No | Score | Rank |
| Advanced topics in matplotlib use | 18 | 10 | 2 | 16 | 1 |
| Advanced numpy | 18 | 10 | 2 | 16 | 2 |
| Designing scientific interfaces with Traits | 15 | 11 | 4 | 11 | 3 |
| Mayavi/TVTK | 13 | 11 | 6 | 7 | 4 |
| Cython | 14 | 8 | 8 | 6 | 5 |
| Symbolic computing with sympy | 15 | 6 | 9 | 6 | 6 |
| Statistics with Scipy | 9 | 15 | 6 | 3 | 7 |
| Using GPUs with PyCUDA | 13 | 7 | 10 | 3 | 8 |
| Testing strategies for scientific codes | 11 | 11 | 8 | 3 | 9 |
| Parallel computing in Python and mpi4py | 12 | 8 | 10 | 2 | 10 |
| Sparse Linear Algebra with Scipy | 9 | 12 | 9 | 0 | 11 |
| Structured and record arrays in numpy | 8 | 14 | 8 | 0 | 12 |
| Design patterns for efficient iterator-based scientific codes | 9 | 7 | 14 | -5 | 13 |
| Sage | 8 | 6 | 16 | -8 | 14 |
| The TimeSeries scikit | 4 | 13 | 13 | -9 | 15 |
| Hermes: high order Finite Element Methods | 6 | 9 | 15 | -9 | 16 |
| Graph theory with NetworkX | 5 | 9 | 16 | -11 | 17 |
There is no official C++ standard, unlike several languages (Java, Python, …) where there are referentials for code and design style, good practices, … It didn’t exist until this book where two world-renowned C++ authors set the basis for your every day development.
101 coding standards, numbered from 0 to 100 (an echo to the fact that C++ starts counting from 0), this is the content of the book.
The standards are split in several groups, from policy to type safetiness. Each time, the coding standard is stated, with a short summary and then a discussion. There can be an example if needed, and some references. The standard is always simple enough to follow, and the explanation is clear yet complete.
The handled issues are very vast, oriented towards common pitfalls. Use inheritance when needed, use collaboration elsewhere, do not inherit from a class that isn’t made for inheritance, … When you are used to these pitfalls (because a lot of C++ gurus talk about them in their forum posts, mails or blogs), you may sometimes forget them and write code that is not optimal (in several ways, performance or maintenability). The book is in that regard a good way of having the good practices classified by topics and easilly accessed: you don’t have to check or search on the Internet. Finally if someone has a question on why you used a specific coding standard, you can give a full explanation and a context (and spread the good pratices).
C++ Coding Standards are sometimes more a question of style than of language, but they are part of the general pieces of advices one should follow. C++ is a language that permits a lot of things, perhaps too much, and this set of rules makes it possible to write readable, efficient, robust code.
C++ Coding Standards: 101 Rules, Guidelines, and Best Practices (C++ In-Depth Series) (Paperback)
by Herb Sutter, Andrei Alexandrescu
ISBN: 0321113586
Price: USD 37.26
52 used & new available from USD 22.00
| 4.5 | 27
Last October I added a toolbar for Chaco plots. It was functional, but it wasn’t very pretty. I decided to rewrite it from scratch, with emphasis on improving the appearance and improving the auto-hide feature.
The new toolbar also employs a new feature to Enable: gradients! Gradient support is still a work in progress, but improving daily.

The numpy 1.3.0 installer for windows 64 does not work very well. On some configurations, it does not even import without crashing. The crashes are mostly likely due to some bad interactions between the 64 bits mingw compilers and python (built with Visual Studio 2008). Although I know it is working, I had no interest in building numpy with MS compiler, because gfortran does not work with VS 2008. There are some incompatibilities because the fortran runtime from gfortran is incompatible with the VS 2008 C runtime (I get some scary linking errors).
So the situation is either building numpy with MS compiler, but with no hope of getting scipy afterwards, or building a numpy with crashes which are very difficult to track down. Today, I realized that I may go somewhere if somehow, I could use gfortran without using the gfortran runtime (e.g. libgfortran.a). I first tried calling a gfortran-built blas/lapack from a C program built with VS 2008, and after a couple of hours, I managed to get it working. Building numpy itself with full blas/lapack was a no-brainer then.
Now, there is the problem of scipy. Since scipy has some fortran code, which itself depends on the gfortran runtime when built with gfortran, I am trying to ‘fake’ a minimal gfortran runtime built with the C compiler. Since this mini runtime is built with the MS compiler and with the same C runtime as used by python, it should work if the runtime is ABI compatible with the gfortran one. As gfortran is open source, this may not be intractable
With this technique, I could go relatively far in a short time. Among the packages which build and pass most of the test suite:
- scipy.fftpack
- scipy.lapack
- some scipy.sparse
Some packages like cluster or spatial are not ANSI C compatible, so they fail to build. This should not be too hard to fix. The main problem is scipy.special: the C code is horrible, and there needs many hacks to build the C code. The Fortran code needs quite a few functions from the fortran runtime, so this needs some work. But ~ 300 unit tests of scipy pass, so this is encouraging.


Greetings,
The conference committee is extending the deadline for abstract
submission for the Scipy conference 2009 one week.
On Friday July 3th, at midnight Pacific, we will turn off the abstract
submission on the conference site. Up to then, you can modify the
already-submitted abstract, or submit new abstracts.
The SciPy 2009 executive committee
We see our EPD Webinar sessions as a great venue for us to provide subscribers with personalized support. Is there a particular challenge you’ve encountered while using EPD? Do you feel like it would be helpful for us to walk you through a process? We encourage you to submit your questions ahead of time so that we can prepare materials and demos to meet your needs.
The webinar format enables us to respond to your questions (either by chat or VOIP, depending on your preference) and share our screen to provide examples and demonstrations. We feel that this could become an invaluable channel of communication for EPD users, and are excited to see how it progresses.
July’s webinar will be held next Thursday at 1pm to accomodate Independence Day weekend. We plan to give an overview and demonstration of parallel processing with iPython, as we’ve seen the tremendous utility of this EPD feature overlooked in the past. Once again, however, if you have a special topic that you’d like to have addressed, feel free to write us an e-mail to tell us what content you’d like to have covered in Thursday’s session.
EPD Webinar: Thursday July 2, 2009
1pm CDT/6pm UTC.
Register at GoToMeeting. A password to enter the webinar will be provided in your confirmation.
I bought this book as soon as it was published, and I sold it soon after. Suffice to say I had a very mitigated impression after reading it. There are good things in it, but also some very bad stuff. It doesn’t describe how to write your ultimate game engine, but the author’s game engine. What about some modesty?
Let’s start with the bad stuff.
This book does not show the engine architecture, the only thing you can see are some pictures with some classes. There are hints about UML, but those pictures are far from being UML. Besides code quality is really disappointing. No const-correctness, no std::string for the parameters, char* are used and then converted to strings (!!!) Really bad coding practices. Another thing is class instantiation before they are actually used whereas the text says it shouldn’t be done! Finally, the book has plenty of code pages, whereas the code is in the CD or on the Internet, so why using so much pages for something that is easily available?
For the good things, the book covers almost every aspect of a multiplatform game engine. For inputs to physics as well as graphism (OpenGL and DirectX), AI, the handling of these systems is there. If some parts show only abstraction for replacable libraries (as for graphism), other show the actual implementation, like for physics. Speaking of physics, it uses a good part of the book to explain how it works, even if it is not a fully-fledged physics engine at the moment of writting.
I regret the fact that the book is about a uncomplete and evlving engine. I also regret the script engine part, because it is kind of false (compiled or interpreted, a script language can always use a virtual machine) and pages could have been better spent for exposing the engine architecture…
As a conclusion, this book was the first (to my knowledge at that time) complete book on a game engine. There are more complete books on parts of it (mainly 3D engines), but nothing on a complete engine. For this, it desevres some credits. Unfortunately, the drawbacks are too big to consider this book a viable option.
Ultimate 3D Game Engine Design & Architecture (Charles River Media Game Development) (Paperback)
by Allen Sherrod
ISBN: 1584504730
Price: USD 37.77
25 used & new available from USD 36.00
| 3 | 5
We are finally opening the registration for the SciPy 2009 conference. It took us time, but the reason is that we made careful budget estimations to bring the registration cost down.
We are very happy to announce that this year registration to the conference will be only $150, tutorial $100, and students get half price! We made this effort because we hope it will open up the conference to more people, especially students that often have to finance such trip with little budget. As a consequence, however, catering at noon is not included.
This does not mean that we are getting a reduced conference. Quite on the contrary, this year we have two keynote speakers. And what speakers: Peter Norvig and Jon Guyer! Peter Norvig is the director of research at Google and Jon Guyer is a research scientist at NIST, in the Thermodynamics and Kinetics Group, where he leads a fiPy, a finite element project in Python.
SciPy 2009, the 8th Python in Science conference, will be held from August 18-23, 2009 at Caltech in Pasadena, CA, USA.
Each year SciPy attracts leading figures in research and scientific software development with Python from a wide range of scientific and engineering disciplines. The focus of the conference is both on scientific libraries and tools developed with Python and on scientific or engineering achievements using Python.
We welcome contributions from the industry as well as the academic world. Indeed, industrial research and development as well academic research face the challenge of mastering IT tools for exploration, modeling and analysis.
We look forward to hearing your recent breakthroughs using Python! Please read the full call for papers.
Update: I correct the typo in the original blog post: the sprints are free, the tutorial are $100.
What an appetizing title! This book is part of an O’Reilly serie that treats a lot of interesting topic. Contrary to Beautiful Code, this one is much shorter but the title suggest it is much more pragmatic.
Each time, the author has some thing to say, it starts with a small phrase that can be disturbing, and then a complete explanation follows. This means you should be ready to be shaken, what the author tries to teach you are sometimes things you have to think about twice (at least I had this feeling).
The book is plit in two parts, theory and pratice. The first one is the longest, with 4 chapters, still the second has more chapters, with 10 ones.
So the first part is about theory. A lot of things make sense, and even if you didn’t read the book, you should know them. Acceleration is about how you can do things faster. Well, it’s not much about developing than setting up you work environment (search utilities, shells, …) so that you can access things faster. Focus is about maintaining focus during development. Less distractions so that you can keep be focused (you need 15 minutes until you achive maximum productivity, so if you’re interrupted every 5 minutes, it’s cumbersome), more space on your desktop with several monitors or virtual desktops (the book gives links to useful tools, like virtual desktops for Windows), … Automation is linked to acceleration, but it is more, it is about doing the same thing several times. Different tools (bash, ruby, …) are used in different examples. What I mainly remember is that you need to master several different tools. The last chapter is Canonality, or the DRY principle (Don’t Repeat Yourself). There are several cases where you seem to need to have several times the same things (when maintaining UML diagrams, for instance), and then using automation, you can have canonality. This is spreading through several pages, and although it is natural to use automation to achieve canonality, you have to realize it first.
The second part is about practice, and more exactly what tools and processes you should use, based on the author’s experience. It seems to be mainly based on the agile patterns, with a start on test-driven development, and then several topics on software architecture, and they come regularly back in following chapters (a good thing INO). Two chapters are about the old times, one dedicated to what good can be extracted from past experiences, and one dedicated to bad experiences that keep on harming development (mainly people that think they know the truth, without questionning themselves, the so-called angry monkeys). The last two chapters are about using the right language for the application and using the right tool (the IDE in this case) for maximum productivity. This part about the actual ways of being more productive achieves, IMHO, its goal with good impulses for a productive programmer.
A lot of the tools that are presented in the first part are not free or open source, and sometimes there exists alternatives in the free community, so you have to look for them. The secon part tends to look more towards those free tools.
The whole book is definitely about good practices, with some more controversial than others. Some of them are also difficult to apply to any language, as tools for this practice are missing. I think the book describes a good set of practices to try to apply personnally ; at least, I’ve decided to try to and to better use the command-line and my IDEs (also fight angry monkeys).
The Productive Programmer (Theory in Practice (O’Reilly)) (Paperback)
by Neal Ford
ISBN: 0596519788
Price: USD 26.39
53 used & new available from USD 19.75
| 4.5 | 20
I had such a blast at the last public webinar that we did to promote Python for Scientific Computing. I am really looking forward to the next webinar which is only a week away (June 19). We had 100 people attend the last one. I know that some who wanted to attend could not because of a mix-up on times, or a problem with the fact that GoToMeeting doesn’t support Linux (I’m not very happy about that, but I don’t see another option right now). I apologize for all those problems, but hope you will try to attend again.
There is a lot that we could cover in these webinars, and I’m anxious for your feedback about what you would like to see. My plan is to put a schedule together so the topics are listed through the end of December after this next webinar. Now is the chance to make your opinion known if you’d like to steer these webinars in a particular direction. Schedules are busy and varied, so I’d like to give plenty of notice so that more people can attend the webinar they are most interested in.
In this upcoming webinar we are going to provide an introduction and demo of Chaco (which we didn’t get to the last time). If there is time, I will also continue the Mayavi demonstration (particular the mlab interface) that we started last time, but I also wanted to showoff EPDLab to a wider audience. You can register for the webinar at https://www1.gotomeeting.com/register/303689873
In the EPD subscriber webinar on June 5th, we discussed EPDLab (an open-source interactive Python environment included as part of EPD). Because EPDLab is a free and open-source project that anyone can participate in, contribute code to, and use as they would like, I think it deserves some attention at this next public webinar. Not only does it provide an enhanced scientific computing environment, it also provides an introduction to the Enthought Tool Suite (a free and open-source collection of tools for building compelling scientific applications — it goes by the abbreviation ETS).
I hope you will excuse a brief aside to clarify ETS and its relationship to EPD. Because we do sell a binary distribution of Python tools called the Enthought Python Distribution (EPD — which also happens to contain ETS), there is sometimes some confusion regarding the license and availability of ETS. ETS is a large BSD-licensed open-source collection of tools with a public SVN repository that anyone can contribute to and participate in the development of. Enthought has released a lot of code in that library which has made it possible for us to write sophisticated, compelling, and attractive scientific computing applications for our customers. ETS contains multiple separate projects. The most important and developed of these projects are Mayavi, Chaco, Traits, TraitsUI, and Envisage. You can learn more about ETS at Enthought’s open source portal.
But, Enthought is a small company and the majority of our marketing effort right now is centered around getting the word out about EPD and our other products and services like training and custom software creation. We don’t have the man-power to advertise ETS very well at the same time, and it can be a little confusing that EPD the distribution does cost money for commercial use, but ETS is free and open-source. Fortunately, people like Gael Varoquaux and Prabhu Ramachandran lead the internet charge to spread the word about the great tools in ETS.
I’m looking forward to seeing many of you on-line again at 1:00pm (Central Daylight Time) on Friday, June 19th. Slides and a recording of the webinar will also be made available here after conclusion of the webinar.
This TDD anti-pattern catalogue is truly excellent!
--titus
Since this post, Intel has officially released Parallel Studio. This is why I’ve published a new, up-to-date review here.
Our lab is seeking to hire an engineer to work on porting our machine learning code to the scikit learn, adding tests and documentation and packaging it.
We are looking for someone motivated by quality in software and open source. No prior scientific computing experience is required. You will be working in a highly stimulating research environment (Neurospin), near Paris and employed by the French research institute in computer science and applied math (INRIA), a prestigious institution.
Neurospin is a research institute dedicated to the understanding of the brain. You will be working with computer-assisted neurology laboratory, the image-analysis and branch of Neurospin, in the small ‘Parietal’ INRIA team embedded in NeuroSpin and dedicated to statistical modeling.
Over the years, the lab has developed a set of tools for machine learning and statistical analysis in Python (with some C). There are some tools for this purpose available in the open-source world (BSD-licensed) in the scikit learn. We want to extract the good and unique parts of our internal library, and release it in the open source world through the scikit learn. Our code is fully Python code, using scipy and matlab, with some bindings to R. As we want the code to be BSD-licensed, we will remove the bindings with R, and replace when possible. The job does not involve developing new algorithms, but testing, improving, and documenting the existing one. There is a big quality assurance work to be done. The code needs to be put to the right coding standards; APIs should be cleaned; tests added. Dead code should be delete. There is some optimization work to be done. Also, if there is any duplicated funcitonnality with the scikit learn, you should analyse both code and determine which one to code. The job also involves working with the community, documentating the code, and releasing the project, including binary packages. And finally, all the original authors of the algorithms, and experts in the field, are in the lab. So you will be able to learn from them and pester them if there is a problem with the code.
In one word, this is about transforming an internal project, into a leading open source project that will rock and live on!
The job description is available here.
There are to caveats: first it is a 2 year position. Second, you need to have graduated recently (how recently I don’t know exactly, but I will inquire).
If you are interested, or just want to ask questions, don’t hesitate to send me an e-mail, I am _really_ looking forward to collaborate with someone motivated on this project.
UPDATE: I have more details on the restrictions of the job offering: you need to have graduated in 2008 or 2009. This is a very hard restriction, and I am recieving many excellent CVs that I even consider because of this restriction. I am sorry, I cannot do anything about it.
I wanted to thank everybody who came to the first EPD Webinar which was held today at 1:00pm CDT in our offices at Austin. We had a few technical glitches which our team resolved quickly. I then spent about 30 minutes showing off new features of EPD such as EPDLab, indexed searching of docstrings with Whoosh, and the new curve_fit function from scipy.optimize. Dave Peterson then spent about 30 minutes showing the use of enpkg which is a command in EPD to allow update, upgrade, and rollback service for egg-packages. This tool should allow subscribers to EPD to keep up to date without having to download and install new installers every time a release is made.
A packaging tool like enpkg has been on the roadmap since the beginning, and it was encouraging to see it in action. There are still a few speed and “verbosity” issues that we are cleaning up, but it looks like a good start to what should be a very useful feature for EPD.
In the future, the EPD webinars will contain about 30-45 minutes of training material drawn from the 7-days of course material on scientific computing with Python that we teach regularly. If there are particular points you would like to see covered, please let us know at info@enthought.com. The current plan for the next EPD Webinar is to provide training on the statistical capabilities of SciPy.
All who attended had the chance to ask questions directly of the Enthought attendees. We look forward to answering more of your questions in the future. The next EPD webinar is scheduled for Friday, July 3 at 1:00pm CDT. If you are an EPD Subscriber at the Basic level, please register. We look forward to your attendance and questions. Feel free to send pre-webinar questions to info@enthought.com
The next public webinar for general scientific computing with Python is
Friday, June 19 at 1:00pm CDT. This webinar is open to all that would like to attend. Right now the plan is to show-off the open-source EPDLab and give an overview of all the tools that are brought together in EPD. You can sign up now for the event. I look forward to seeing many of you attend.
We’ve recently made the decision to start applying resources to generating x86_64 OS X builds of EPD. Because of limited resources, this means we’re officially dropping PPC support in EPD. It also means that it may take us months to get things released for the x86_64 (also known as amd64) architecture.
As an example of some of the issues we’ll face, we’ll need to decide how to handle the GUI backend situation. You see, the wxWidgets project hasn’t yet released 64-bit build support for OS X’s Cocoa framework, and the Carbon framework isn’t 64-bit, so we’re stuck either starting with a “server” / console build of EPD, shipping on an unreleased version of wxWidgets, delaying the release while we help finalize x86_64 Cocoa builds of wxWidgets and wxPython, or switching to a different backend like Qt.
While Qt and the PyQt (Python bindings for Qt) seems like a no-brainer technology-wise, the license situation is a hurdle for us to overcome. We’ve tried hard, but haven’t always succeeded, to avoid GPL licensed projects in EPD in order to make it more palatable to commercial users, like even our own consulting projects. And, yes, Qt itself recently came out with an LGPL license option that would suit EPD’s needs, but PyQt isn’t similarly licensed (yet). So now we have to decide whether such a core capability (the only GUI backend of OS X x86_64) would be acceptable to be GPL licensed.
If any one has any thoughts or suggestions on how to resolve this issue, please don’t hesitate to let us know!
By the way, regarding the PPC situation, we have effectively already started to drop PPC support with the EPD Py25 v4.3.0 release. We made a good faith effort to build in the PPC support but simply didn’t do significant testing of the results. In the end, it turns out that at least one core module, SciPy, ended up with binaries that don’t fully support PPC. Sorry, but we do not plan to issue fixes for this.
If you’re a PPC user, the last working version of EPD for PPC was EPD Py25 v4.2.30201.
Strategy games are the type of games I prefer. Turn-based or real-time, they share some common ground. This book tries to explain them.
The book is split in three parts: war, peace and design.
War is generally at the center of a strategy game (war, or tensions that can lead to it). War consists of units fighting together. Equilibrium is something that is not easy to achieve, and the game must suggest different strategies with those units to remain interesting. Relevant facts about those topics are introduced.
The Peace chapter is not only about peace, perhaps only truce before the war. It’s about resources, technologies or stuff like that. It is not mentioned that too many different resources can lead to a bad game as well (too complicated to handle, an AI that cannot cope with it can ruin the game, …)
Finally, almost half of the book is dedicated to design. Not only interface design, but also game and gameplay design. Do I need a hero? How far does the world strech? These kind of questions can make really different games, and also a game that wanted things to big can in the end be a failure because of these.
This small book can be read in a few hours. It is mainly based on actual game images, but their legends are too small, especially when they convey useful information. This problem put aside, the book is really enjoyable, it does not go into too many details, only to give a first overview of what a strategy game should be.
Game Guru: Strategy Games (Paperback)
by Dave Morris
ISBN: 1592002536
Price: —
21 used & new available from USD 8.95
| 4 | 1
I'd like to find an MSU student to report semi-monthly on python-dev. The student would be responsible for monitoring the python-dev mailing list and active PEPs, summarizing substantive discussions in a public forum, and integrating feedback from the community. This would be a 1 credit CSE independent study course (CSE 490). Additional effort (for more credits) could be applied towards building and maintaining a CMS site to store and reference past and present summaries, or integrating reviews of new modules.
The ideal student would be someone who communicates well in writing, is interested in technical reporting, and has some basic experience with programming. Python experience (CSE 231) is a plus.
Please send a brief summary of interests together with a sample of writing to ctb@msu.edu.
--titus
Apparently the ipaddr module in Python 3.1 is disliked by some, and there was a reasonably robust discussion on python-dev about how it's wrong, wrong, wrong. Guido finally ruled: ixnay on the addr-pay.
This is pretty relevant given the twitstorm caused by Zed Shaw's ludicrously self-confident rants about how he always knows best and is a kickass programmer and oh, by the way, the Python stdlib is kinda lousy in places. I think the thing to take away from Zed's rant is that the Python module addition process is, in fact, moderately FUBARed, with some people able to add perhaps ill-considered modules while others have to struggle to get the time of day. (Aahz's solution is good -- require a PEP.)
It's relevant personally, too, as I dig my way through some of pygr's modules. It's way easier to add code than it is to refactor it, especially if you don't have a lot of unit tests; if you want to retain backwards compatibility, you're basically doomed. DOOOMED, I say! And that's why the Python stdlib has so many issues.
(Incidentally, nothing against Zed Shaw -- obnoxiousness is his public persona, and he's definitely worth listening too -- but it is funny to realize that all his articles contain arguments that boil down to "he always knows best and is a kickass programmer." I especially liked his statistics rant.)
--titus
I’ve promised to make an update whenever I would find a solution to the problem I had some months ago when I tried to use the latest MKL with numpy. Well, there was a simple hack that did the trick. It is far from being perfect, but at least, the tests pass now.
So the only thing you have to do is to export the LD_PRELOAD variable:
export LD_PRELOAD=/path/to/the/MKL/lib/libmkl_core.so |
We’ve just updated the EPD product website with links to a new EPD repository RSS feed: http://www.enthought.com/products/epd-rss.xml.
This feed is updated every time we release an update / upgrade to a project included in EPD, and soon everytime we publish a new installer. All entries into the feed specify the platform the update / upgrade was released for, the date and time of release, the name of the thing being updated / upgraded, and a short description of the project being updated.

We’ve added this feed to both the main EPD product page, and also the ‘Download’ page — they’re the same feed so no need to subscribe twice! For those with browsers that recognize RSS declarations in the page/HTML headers, you’ll see the RSS feed icon in your browser’s URL field when visiting these pages. For others, simply click on the explicit RSS link on the main page right under the big, red, “Download Now” button.
As an alternative to this all-platforms feed which is available to everyone, even those without an EPD subscription at all, current EPD subscribers with access to the repository can subscribe to a filtered RSS feed for their target platform(s). These can be found within the various platform-specific directories under the top-level ‘eggs’ directory. Note that only those with ‘Basic’ subscriptions and above can access the repository.
Please don’t hesitate to send us suggestions on how we can continue to improve EPD!
Enthought’s first public webinar was a success, despite a few technical glitches. The large attendance in spite of the short notice was gratifying. Travis was able to cover only a fraction of the material he had hoped to cover, so there is plenty of material for future sessions (such as Chaco and Mayavi).
The recording of the webinar is now available for download. While the native recording format of GoToWebinar is Windows Media Player, we have converted it to Matroska format, so we hope that folks on all platforms will be able to view it. Please let us know if you have problems getting or playing it.
The next public webinar will be Friday, June 19 at 1:00 CDT. Specific topics TBD.
Meanwhile, we are launching a second webinar series, exclusively for subscribers to the Enthought Python Distribution at the Basic support level or higher. Those subscribers will receive an e-mail announcement shortly.
IronPython is the first dynamic language developed for the .Net plateform. At first, .Net didn’t support this kind of language. This is something that keeps on coming back througout the book: you have to use some additional tricks to unleash the power of .Net dynamic and static languages.
The book starts with a general introduction to IronPython. A quick review of the language itself is followed by the use of the .Net assemblies. At the end of this part, one is comfortable enough to do some small IronPython programs.
The next part is dedicated to what IronPython offers thanks to Python and to its .Net affiliation. The authors go through standard Python (battery included) and the somewhat associated .Net assemblies (some arguments on using one or the other could have been a big plus to the explanations), depending on what must be done. Because or (or thanks to) .Net, several pages are dedicated to XML, as it is needed to simplify the description of UIs. Also several useful designed patterns are presented with the .Net approach.
The next part starts with WPF, the official graphical interface, with several ways of using it (bridge from C#, XAML, …). Then WMI (used for system administration) is handled, but from my point of view, it is the weirdest part. WMI has its own language which does not seem like C# or Python. Besides, PowerShell, presented as well as a way of doing system administration, has its own language. There is a book dedicated to PowerShell, so only the communication between IronPython and PowerShell is handled. So two additional languages in this chapter, perhaps too many (they are limited to this chapter).
IronPython is a .Net language, so it is possible to do ASP with it. A chapter deals with this approach, chapter well written but it needs to follow the associated example in your favorite IDE if you want to follow what’s happening. Web means also web services and databases, handled in one chapter. The basis of SQL tools addressed, as well as basic webservces (mainly REST). I have to say that there are some mistakes there, as SOAP is not only used with POST HTTP requests but also with GET requests (it can be seen in the official w3c specification) and also with other transport protocols than HTTP. Perhaps these are .Net implementation’s limitation, in which case it should have been mentioned Finally Silverlight integration allows developping light clients that can interact with other langages as well as the web page.
Throughout the book, complete interaction with other .Net languages was not addressed. It is the goal of the last part to show how assemblies can be used in IronPython and how IronPython scripts can be used from .Net static languages. As I’ve said, the interaction does not go completely smoothly, there are several solutions to accomplish it. At least, the book does not only speak about the upcoming .Net 4.0 that will help this interaction.
As a conclusion, those who need a dynamic language (to script an application) can go for IronPython, th first dynamic language for the .Net framework, compatible with the langage Python 2.5, and in that case, go for this book that will help you for anything.
IronPython in Action (Paperback)
by Michael Foord, Christian Muirhead
ISBN: 1933988339
Price: USD 29.69
44 used & new available from USD 21.50
| 4.5 | 7
I have heard several times that every linux distribution should have the same package manager (where it is understood that there is one-too-many within the rpm vs deb), and it was mentioned once again recently in a well publicized video (see on linux hater blog)
The argument goes as follows: doing packaging takes time, and making packages for every distribution is a waste of time. If every distribution used the same package system, it would be much better for 3rd party distributors. Many people answer that competition is good, having many distributions is what makes Linux great – [insert usual stuff about how good Linux is].
While it is true that multiple packages systems means more work, saying that there should only be one is kinda clueless – I wonder if anyone pushing for this has even done any rpm/deb pacaking. What makes deb vs rpm a problem is not that they are different formats, like say zip vs gunzip, but that they are deployed on different systems. A RHEL rpm won’t work great on Mandrake, and even if a lot of debian .deb work on Ubuntu, it is not always ideal. The problem is that each distribution-specific package needs to be designed for the target distribution. To build a rpm or a deb package, you need:
Basically, almost everything which makes the difference between a distribution A and B ! For file locations, the LSB tries to standardize on this, but some things are different, like where to put 64 vs
32 bits libraries. One distribution may have libfoo 1.2, another one 1.3, so even if they are compatible, you can’t use the same for every distribution. Or some libraries do not have the same name under different distributions.
So requesting the same package manager for every distribution is almost equivalent to asking that every distribution should be the same. You can’t have one without the other. You can argue that there should be only one distribution, but don’t forget that Ubuntu appeared like 5 years ago.


We are trying something new at Enthought. I’m going to host the first Enthought Webinar on Scientific Computing. This webinar is free for people interested in showing up. You should plan to come with a bit of patience as we may not have all the wrinkles worked out of the technology and what it means for having a discussion.
I want to spread the word about all the very cool tools that the Open Source community has produced for using Python in Science. The first webinar will be on Friday at 3:00pm CDT. You can attend via your computer by registering at the following link: Python for Scientific Computing Webinar. I will be talking about NumPy, structured data-types, and memory mapped arrays (and how to use them for reading data quickly from files). I will also be showing off Chaco for 2-d interactive visualization and Mayavi for 3-d visualization. Come with questions as you will have the opportunity to ask them if you would like.
We will start 15 minutes early for people who want to get help setting up with the Webinar technology from GotoMeeting. If you have never attended a Webinar before, you may want to come and try it out. I look forward to seeing many of you online this Friday.
Some months ago, I’ve decided to dig into raytracing, and more exactly interactive raytracing. So I’ve started writting my own library, based on several publications.
nVidia announced recently its own framework, Intel wants also to do raytracing on Larrabee, it is the current trend.
First, raytracing is a tool to draw a picture from a scene. It’s like a camera observing the scene, and to know the content of each pixel on the film, a ray is cast through the scene. Depending on the objects, rays will propagate to other objects or lights, and then the actual color of the camera pixel can then be computed.
A raytracer is in fact really easy to write. The math formulae are simple for a version that will return satisfactory results. On the contrary, an interactive or “real-time” one is more complicated. This is why I started my small project and why I will blog from time to time about it.
Interactive or real-time means that the drawing must be fast, really fast. This also means that realistic drawing is out of the question. I’ve started with a simple raytracer, with some optimizations, and then I’ve added some algorithmic optimizations. I will talk about all this in several posts.
In physical vision, light rays travel around the world from the sources, hitting objects, then reflected or transmitted and finally arriving at the eye (or camera). The principle of raytracing is to do it backwards, from the camera to the light sources. The camera film, or the retina, is symbolized by a screen.
In this case, the emitted ray hits object 1 with some angle to the normal (the normal vector is the vector orthogonal to the tangent plane of the object). Object 1 gets some light and so the color ray depends on this light, the object and the angle between the normal and the direction of the light.
The ray is then reflected to object 2 and hits it with another angle. As for object 2, it gets some light and the reflected ray carries a specific color. This color is then blended with the color of the primary ray, and this final color is the color of the pixel on the screen.
I’ve written my library in C++, tested it with several compilers on different platforms. So as to test the code, to profile it and to create a scene in an elegant manner, I’ve added a small Python wrapper. This will be the subject of my next post.
Then, I’ll go through secondary and shadow rays, a GUI for the raytracer, acceleration structures, …
The code is released under the LGPL on Launchpad.
Interactive Raytracer on Intel website
An Intel article on real-time raytracer architecture
nVidia presentation on raytracing
OK Folks, I know that planet.python.org and planetpython.org underwent a merger, and during the merger a new, or patched, or somehow upgraded version of planet went into effect on both. However, I cannot find a link to the info post any more.
I would like to put the latest stable version of PlanetPlanet into effect on the Google Summer of Code/Python site but I am wary of using the devel repo without any inside info. (I am currently running 2.0, which is the latest official release.)
Should I use the dev repo, or should I track down whatever version planet.python.org is using?
thanks!
--titus
May 30th and 31st the French Python conference, Pycon FR, will be held at ‘la citée des sciences’, la Villette, in Paris.
The first day, I will be giving a one-hour-long tutorial (in French) on numpy, scipy, and all the Python for Science jazz. On the following day, I will be giving a half-hour-long talk to ilustrate the use of Python in my current work: statistical analysis and modelling of brain activity.
I’ll be giving my tutorial in one room, while David Larlet (the famous Biologeek) will be giving one on Django in another room. Tough competition
.
The program of the conference is very eclectic, ranging from general programming talks, to GUIs or web development. While this might deter the pure scientific computing folks, I strongly encourage you to attend. Indeed, a lot of the development, packaging, quality assurance, … problems encountered in scientific computing are universal in computing.
You might think that you are only interested in writing algorithms,or processing data, but this code will have to live on. My experience is that it is terribly hard to have code in a lab that can be somewhat shared and live on when people move away to another lab, or stop having time to maintain the code. Talks like
can probably be of some use.
Also, don’t underestimate the fact that some other communities might have solved some of the issues you struggle with. When dealing with real-world problems, and not only developing algorithms on a few set of test data, a large fraction of the code lines and related to IO, interfaces, data massaging… Two years ago, I remember that I was not terribly interested in the web-development talks. I tried to be open-minded and listen to them, but… Now I have done a bit of web development myself, and I have played with some of the famous ‘web frameworks’. I can tell you, there are some really interesting concepts there. The web guys have managed to extract a set of patterns from the problems they face and provide excellent abstracts to data handling and display. Can we learn from them? I am especially interested in getting more insight from things like ORMs (object relational mappers), and understanding better the web frameworks:
And finally, one more reason to come: it is so nice to actually get to meet in real life people, and have a chat.
So, see you there, for those who live in France.
Just submitted this on Thursday:
Next generation sequencers are beginning to impact agricultural biology. Over the next few years, next generation sequencing will produce incredibly large datasets that will address structural (e.g., SNPs, CNVs, indels, methylation, translocations) and functional (e.g., RNA expression, transcription factor binding sites) variation in genomes that will provide detailed insights that could explain phenotypic variation. Despite this immense power, next generation sequencing in agricultural animals will not be used effectively due to the lack of easy-to-use computational tools to support data analysis, and the unique needs of agricultural animal genomes. We propose to build an easy-to-use Web interface that incorporates several existing mapping and post-mapping analysis programs for next generation sequencing data that will greatly empower agricultural researchers. We will also provide solutions to issues such as unfinished and unannotated assemblies, private data sets, private annotations, etc. Our tools will give individual investigators or small groups with no computational support the power to utilize and interpret next generation sequencing data.
Any guess as to the funding agency? Yep....
The exciting life of a professor continues!
--titus
I just submitted a Mellon Award for Tech Collaboration nomination for the Python Buildhaus. What's that, you ask?
The Python Buildhaus is a project to systematically build, test and release Open Source Python packages on Windows, Mac OS X, and a wide array of other UNIX architectures and operating systems (see snakebite.org for list). In addition to providing machine access, software support, and process support, we hope to create a set of best practices and process documentation to help the community address cross-platform compatibility issues. We will also build tools to extend the impact of this effort beyond Michigan State by providing longer-lasting developer resources, e.g. tools to auto-build Python eggs and installers across multiple platforms.
This will be an open resource for the Python community.
See the Python Buildhaus and our proposal.
This is basically an attempt to use Snakebite to push specifically to help with the cross-platform distribution problem.
--titus
A few of us at Enthought (Peter Wang, Robert Kern, and I) traveled to Toronto two weeks ago to attend a very interesting summit of scientists and others connected to finance and economics to discuss whether and how science can provide assistance in understanding economics sufficiently to prevent or at least mitigate economic breakdowns such as the one we’ve just experienced (and are still dealing with). The conference was titled The Economic Crisis and its Implications for the Science of Economics. Some background material for the conference can be read at Edge.org, and at least two blog-posts covering the conference can be read: one by Stephen Hsu and another by Barkley Rosser.
We were invited because Eric Weinstein is a fan of Python and the tools in the Enthought Python Distribution (including NumPy, SciPy, SymPy, MayaVI, and Chaco). Robert Kern produced some very nice visualizations for Eric’s talks in the conference using MayaVi and Chaco which can be seen in Eric Weinstein’s two talks: 30 minutes into the first one and 45 minutes and again 1:30 minutes into the second one (Actually, the second talk was Pia Malaney’s talk and Eric enthusiastically joined her half-way through — I guess being married has its advantages for getting more air time.)
The conference was intellectually stimulating and very enjoyable. I enjoyed all of the conversations I personally had with the participants which ranged from probability theory to cognitive neuroscience to quantum mechanics to computer platforms for agent-based modeling. I encourage you to read and listen in more depth to what the participants had to say in their talks because I won’t be able to provide sufficient summary to the conference. All of the conference talks are online. What isn’t shown in the videos, though, are the break-out discussions that took place between sessions and at meal-time.
In these break-out discussions I enjoyed getting to know all four members of the PartEcon team. Apparently, Mike Brown organized this group after an agent-based model (discussed at the conference by Alexander (Sasha) Outkin) predicted some useful results of changing the tick-size to decimals on the NASDAQ. They have incorporated principles of double-entry book-keeping into their agent-based model. They also stayed after the conference to continue comparing notes with another team from the Perimeter Institute that had written about an agent-based model using a more formal setup (by Samuel Vazquez and Simone Severini).
While there was one early talk on the first day by Richard Alexander that touched on the genetic component of human agents, the impact of having evolutionary biologists present (like him and one of his students, Bret Weinstein) was much larger than their presentation footprint. They provided insightful discussions during several break-out sessions (Peter Wang even commented that in another life he might have become a biologist).
Lee Smolin sent around a very nice summary of the conference and suggested a unifying theme of “path-dependence in economic dynamics.” Eric and Lee were both there to explain how gauge theory provides the tools to solve the problem of changing preferences that has plagued traditional academic economics. Eric did a great job of showing how this manifestly untrue concept of unchanging preferences has at least been put forward by several leading economists. It’s still unclear to me whether or not gauge theory actually provides new results, but it definitely seems like a more useful mathematical toolbox to use and build from.
I was disappointed that amidst all the discussion of the failure of economic modeling there was not at least some discussion about the Mises-Rothbardian ideas of fiat currency and fractional-reserve banking being the primary source of the booms and resulting busts. I wanted to learn from the people there rather than try and debate this one particular theory of economics so I pretty much stayed quiet. One gentleman sitting next to me during the first day asked the panel whether the crises shows the failure of fiat currency and and got a very unsatisfying answer from Nouriel Roubini that simply dismissed the question, but did not really address it.
Given that the economic experts have basically shown repeatedly they don’t know what they are doing, intellectual honesty would seem to me to require listening to all sides of a debate, instead of dismissing a whole theory of economics (such as the Austrian school) primarily because it doesn’t use math as its starting point. Fortunately, there are very good texts that argue against fractional-reserve banking and the role it may actually play in causing economic instability. One of them is “Money, Bank Credit, and Economic Cycles” by Jesus Huerta de Soto.
I really enjoyed the conference because it seemed to combine all of the interests I’ve developed over the years: math, probability theory, neuroscience, economics, and computers. I’ve had a hobbyist interest in Economics ever since graduate school at the Mayo Clinic when I was learning about Linux and Python. I fell in love with open source software but wanted to understand how “giving software away” could work sustainably in a society. It was this question that led to me finally reading Mises and Rothbard and a whole host of other non main-stream economists. I can’t say I’ve figured anything out, but I have very much enjoyed the ride.
I’m also very hopeful in some of the ideas I saw at the conference that may help us inch closer to an understanding of the truth of an economic system (mathematically modeling changing preferences, using agent-based models, and even the idea of local currencies that was discussed among some at the conference).
In the more immediate future. It looks like there is some discussion afoot for building a platform for agent-based modeling that I hope Python plays prominently in. There is a real power in using an expressive and dynamic language like Python that allows for rapid development. It is a general-purpose language that scientists and engineers can actually get excited about. In addition, the work of Paul Borrill’s company (Replicus) in creating an agent-based storage solution looks immediately promising. Perhaps Enthought can provide some tools to assist in managing such a system. I’m enthused and anxious to continue to support the improvement of using computers to help solve some of the world’s most challenging problems. There is much more that could be said, but I’m sure this blog (with no photos) is long enough.
As I have already written in a previous post, I have moved away from bzr to git for most of my software projects (I still prefer bzr for documents, like my research papers). A lot if not most of the comparison of git vs other tools focus on speed. True, git is quite fast for source code management, but I think this kinds of miss the point of git. It took me time to appreciate it, but one of the git’s killer feature for source code control is the notion of content tracking. Bzr (and I believe hg although I could not find good information on that point) use file id, i.e. they track files, and a tree is a set of files. Git, on the contrary, tracks content, not files. In other words, it does not treat files individually, but always internally consider the whole tree.
This may seem like an internal detail, and an annoyance because it leaks at the UI level quite a lot (the so-called index is linked to this). But this means that it can record the history of code instead of files quite accurately. This is especially visible with git blame. One example: I recently started a massive surgery on the numpy C source code. Because of some C limitations, the numpy core C code was in a couple of giantic source files, and I split this into more logical units. But this breaks svn blame heavily. If you just rename a file, svn blame is lost can follow renames. But if you split one file into two, it becomes useless. Because git tracks the whole tree, the blame command can be asked to detect code moves across files. For example, git blame with rename detections gives me the following on one file in numpy:
dc35f24e numpy/core/src/arrayobject.c 1) #define PY_SSIZE_T_CLEAN dc35f24e numpy/core/src/arrayobject.c 2) #include <Python.h> dc35f24e numpy/core/src/arrayobject.c 3) #include "structmember.h" dc35f24e numpy/core/src/arrayobject.c 4) 65d13826 numpy/core/src/arrayobject.c 5) /*#include <stdio.h>*/ 5568f288 scipy/base/src/multiarraymodule.c 6) #define _MULTIARRAYMODULE 2f91f91e numpy/core/src/multiarraymodule.c 7) #define NPY_NO_PREFIX 2f91f91e numpy/core/src/multiarraymodule.c#include "numpy/arrayobject.h" dc35f24e numpy/core/src/arrayobject.c 9) #include "numpy/arrayscalars.h" 38f46d90 numpy/core/src/multiarray/common.c 10) 38f46d90 numpy/core/src/multiarray/common.c 11) #include "config.h" 0f81da6f numpy/core/src/multiarray/common.c 12) 71875d5c numpy/core/src/multiarray/common.c 13) #include "usertypes.h" 71875d5c numpy/core/src/multiarray/common.c 14) 0f81da6f numpy/core/src/multiarray/common.c 15) #include "common.h" 5568f288 scipy/base/src/arrayobject.c 16) 65d13826 numpy/core/src/arrayobject.c 17) /* 65d13826 numpy/core/src/arrayobject.c 18) * new reference 65d13826 numpy/core/src/arrayobject.c 19) * doesn't alter refcount of chktype or mintype --- 65d13826 numpy/core/src/arrayobject.c 20) * unless one of them is returned 65d13826 numpy/core/src/arrayobject.c 21) */
You can notice that the original file can be found for every line of code in the new file. The original author and date may be found as well, I just removed them for the blog post.
This is truely impressive, and is one of the reason why git is so far ahead of the competition IMHO. This kind of features is extremely useful for open source projects, much more than rename support. I am ready to deal with quite a few (real) Git UI annoyances for this.
It looks like my example was not very clear. I am not interested in following the renames of the file: in the example above, the file was not arrayobject.c first, then renamed to multiarraymodules.c, and later to common.c. The file was created from scratch, with content taken from those files at some point. You can try the following simplified example. First, create two files prod.c and sum.c:
#include <math.h>
double sum(const double* in, int n)
{
int i;
double acc = 0;
for(i = 0; i < n; ++i) {
acc += in[i];
}
return acc;
}
#include <math.h>
double prod(const double* in, int n)
{
int i;
double acc = 1;
for(i = 0; i < n; ++i) {
acc *= in[i];
}
return acc;
}
Commit to your favorite VCS. Then, you reorganize the code, and in particular you put the code of both files into a new file common.c. So you create a new file common.c:
#include <math.h>
double prod(const double* in, int n)
{
int i;
double acc = 1;
for(i = 0; i < n; ++i) {
acc *= in[i];
}
return acc;
}
double sum(const double* in, int n)
{
int i;
double acc = 0;
for(i = 0; i < n; ++i) {
acc += in[i];
}
return acc;
}
And commit. Then, try blame. Rename tracking won’t help at all, since nothing was renamed. On this very simple example, you could improve things by first renaming say sum.c to common.c, then adding the content of prod.c to common.c, but you will still loose that the prod function comes from prod.c. git blame -C -M gives me the following:
^ae7f28a prod.c 1) #include <math.h>
^ae7f28a prod.c 2)
^ae7f28a prod.c 3) double prod(const double* in, int n)
^ae7f28a prod.c 4) {
^ae7f28a prod.c 5) int i;
^ae7f28a prod.c 6) double acc = 1;
^ae7f28a prod.c 7)
^ae7f28a prod.c 8) for(i = 0; i < n; ++i) {
^ae7f28a prod.c 9) acc *= in[i];
^ae7f28a prod.c 10) }
^ae7f28a prod.c 11)
^ae7f28a prod.c 12) return acc;
^ae7f28a prod.c 13) }
^ae7f28a sum.c 14)
^ae7f28a sum.c 15) double sum(const double* in, int n)
^ae7f28a sum.c 16) {
^ae7f28a sum.c 17) int i;
^ae7f28a sum.c 18) double acc = 0;
^ae7f28a sum.c 19)
^ae7f28a sum.c 20) for(i = 0; i < n; ++i) {
^ae7f28a sum.c 21) acc += in[i];
^ae7f28a sum.c 22) }
^ae7f28a sum.c 23)
^ae7f28a sum.c 24) return acc;
^ae7f28a sum.c 25) }
hg blame on the contrary will tell me everything comes from common.c. Even when using the rename trick, I cannot get more than the following with hg blame -f -c:
81c4468e59f9 sum.c: #include <math.h>
81c4468e59f9 sum.c:
81c4468e59f9 sum.c: double sum(const double* in, int n)
81c4468e59f9 sum.c: {
81c4468e59f9 sum.c: int i;
81c4468e59f9 sum.c: double acc = 0;
81c4468e59f9 sum.c:
81c4468e59f9 sum.c: for(i = 0; i < n; ++i) {
81c4468e59f9 sum.c: acc += in[i];
81c4468e59f9 sum.c: }
81c4468e59f9 sum.c:
81c4468e59f9 sum.c: return acc;
81c4468e59f9 sum.c: }
3c1ac7db76ba common.c:
3c1ac7db76ba common.c: double prod(const double* in, int n)
3c1ac7db76ba common.c: {
3c1ac7db76ba common.c: int i;
3c1ac7db76ba common.c: double acc = 1;
3c1ac7db76ba common.c:
3c1ac7db76ba common.c: for(i = 0; i < n; ++i) {
3c1ac7db76ba common.c: acc *= in[i];
3c1ac7db76ba common.c: }
3c1ac7db76ba common.c:
3c1ac7db76ba common.c: return acc;
3c1ac7db76ba common.c: }

I got my hand on an old edition of this book second edition, now the third is available), and it seemed to me a good place for game developers to start.
Mike McShaffry has a lot of experience from the game field, and his goal is to share it with the readers. In every chapter, there are some anecdots of his past, and it is a lot of fun to see studios falling in the same pitfalls than we do when we start coding.
The book is split in four different parts. The first one starts with the fun you can get coding a game, but also the troubles you will have. And what technology will you use? 2D? 3D? And what do they imply? As for every code, there is a set of general good pratices, as memory handling, scripts, … that need to be address. The author sometimes did not use them, and there are examples where they caused troubles.
As to get the game running on a computer, another set of rules is needed. Without them, it is just hard to have a running game in the end. How is the game built (not everybody uses a tool to automaticaly build the game)? How to interact with the game? A lot is written about this last issue, and as the author is used to Direct X, the clues are explained with it. But the advice can be used with other technologies. One just has to find the equivalent functions in the other framework. Obviously, it is not possible for the book to express this in every available framework, and it is also not the purpose of the book (it is not a book on a game engine, not a book on Direct X, …).
The third part is also mainly about Direct X, more exactly the 3D part. It lays down the basis for any 3D game engine, but it is not the book’s goal to be exhaustive about the design of a 3D engine. Also Microsoft imposes a set of rules to get the appropriate “Windows compatible” logo, which is needed if you want to sell the game. The last chapter in this part tacklesdebugging the game. I have to say this is much needed and too often it doesn’t appear in game programming books, although it is one of the pillar of programming. Different debugging techniques are addressed.
Finally, the last part tackles how the coding must be driven. Scheduling and milestones, testing and fixing the bugs or how the game will finally be published (what needs to be done at the end or after the end), all you need if you are in the game industry and you have to handle a commercial release.
My edition of the book is several years old, and I felt it, as different examples are outdated (the requirements needed for a today game, the Direct X version, Windows versions that need to be supported, ..). I couldn’t check in the thirs edition if this was updated, but it should have: the whole point of a new edition is to update these facts. So if you want to code a game, buy the last edition of this book.
Game Coding Complete (Paperback)
by Mike McShaffry
ISBN: 1584506806
Price: USD 37.79
31 used & new available from USD 34.14
| 4.5 | 26
Gary Ruben came up with the excellent idea of visualizing the minimum spanning tree of a Delaunay tesselation in addition to Delaunay tessalation itself. After he sent me his code, I spent some times playing with it, because I found out that, with the right choice of visualization parameter, it gave me a nice understanding of what a minimum spanning tree was: a tree structure of minimal total length connecting all the vertices of the graphs, and embedded in the graph. On the visualization, the Delaunay graph is displayed in grey, and the minimum spanning tree in thick and colors.

The minimum spanning tree is calculated using Prim’s algorithms, on the fullly-connected distance-weighted graph of all points. One can clearly see that is it embedded in the Delaunay graph. In fact I have tested that calculating a minimum spanning tree on the Delaunay graph, or on the complete graph, gave the same result.
The code to create this picture can be found here.
Hi Ondrej,
How are you?
I am about to publish a new free software project, a new simple PHP framework, and I am interested in your advice.
You started SymPy and were able to make other people join you and develop it with you.
How did you do it?
How did it happen?
Did you actively call for other people or they spontaneously showed interest and joined you?
Are the other major contributor people who were your friends before you started the project?
Did you need to create or manage the project in a particular way to make it attractive to other people?
Are there things you are aware of that promote collaboration or demote it?
I was never successful in doing the same with Winpdb, which while it became reasonably popular, no one has ever joined me to develop it, except for a notable tutorial contribution by Chris Lasher which was developed independently.
Now with the new project, I am wondering what are my chances of making other people try it and take it on. On the one hand it is a new and fresh code base in an interesting field, on the other hand, why would anyone bother to spend their energy on this new project when they have Symfony or Drupal?
What do you think?
BTW, Ohloh believes you have a median of 19,000 lines of changed code per month since the start of their log. Can this be true? Is this humanly possible? According to it SymPy has over 1,000,000 lines of code? I can't understand these numbers. Winpdb has about 25,000 lines after 3 years of development. And from my experience 1,000,000 lines of code projects need about 20-50 full time developers to work on for 2-5 years which is about 40-250 man years. And as if this is not enough you are listed as owner in a dozen other projects in Google code and have enough time to become an awarded scientist. How is this possible?
http://www.ohloh.net/p/sympy/contributors/
BTW2, do you still use Winpdb? If you find yourself using it less, can you say what are the reasons, or what it would take to make it more useful?
BTW3, How is SymPy doing?
Cheers,
Nir
Are the other major contributor people who were your friends before you started the project?No, not a single major contributor was my friend before I started the project. Every single one of them become a developer using the procedure I described above, e.g. first showed on the list or in the issues, and maybe even the very first patch was not a high quality one (and if I was stupid and arrogant, or didn't see the big potential, I would just ignore them). But when given a chance, they became extremely good developers and sympy would simply just not be here without them.
Did you actively call for other people or they spontaneously showed interest and joined you?I very much encourage everyone to contribute, but the initial interest must be in them, e.g. they at least have to show around the mailinglist/issues, so that I know about them. But once I know they are interested in some issue, yes, I try to invite them to fix it, with my help.
by Ondřej Čertík (noreply@blogger.com) at May 10, 2009 01:47 PM
For quite some time, I wanted to add code coverage to the C part of numpy. The upcoming port to python 3k will make this even more useful, and besides, Stefan Van Der Walt promised me a beer if I could do it.
There are several tools to do code coverage of C code – the most well known is gcov (I obviously discard non-free tools – those tend to be fairly expensive anyway). The problem with gcov is its inability to do code coverage for dynamically loaded code such as python extensions. The solution is thus to build numpy and statically link it into python, which is not totally straightforward.
I first looked into simpler extensions: the basic solution is to add the source files of the extensions into Modules/Setup.local in python sources. For example, to build the zlib module statically, you add
*static*
zlib zlibmodule.c -I$(prefix)/include -L$(exec_prefix)/lib -lz
And run make, this will statically link the zlib module to python. One simple way to check whether the extension is indeed statically link is to look into the __file__ attribute of the extension. In the dynamically loaded case, the __file__ returns the location of the .so, but the attribute does not exist in the static case.
To use gcov, two compilation flags are needed, and one link flag:
gcc -c -fprofile-arcs -ftest-coverage …
gcc … -lgcov
Note that -lgcov must be near the end of the link command (after other libraries flags). To do code coverage of e.g. the zlib module, the following works in Modules/Setup.local:
*static*
zlib zlibmodule.c -I$(prefix)/include -fprofile-arcs -ftest-coverage -L$(exec_prefix)/lib -lz -lgcov
If everything goes right after a make call, you should have two files zlibmodule.gcda and zlibmodule.gcno into your Modules directory. You can now run gcov in Modules to get code coverage:
cd Modules && gcov zlibmodule
Of course, since nothing was run yet, the code coverage is 0. After running the zlib test suite, things are better though:
./python Lib/test/test_zlib.py && gcov -o Modules Modules/zlibmodule
The -o tells gcov where to look for gcov data (the .gcda an .gcno files), and the output is
File ‘./Modules/zlibmodule.c’
Lines executed:74.55% of 448
I quickly added a hack to build numpy C code statically instead of dynamically in numscons, static_build branch, available on github. As it is, numpy will not work, some source code modifications are needed to make it work. The modifications reside in the static_link branch on github as well.
Then, to statically build numpy with code coverage:
LINKFLAGSEND=”-lgcov” CFLAGS=”-pg -fprofile-arcs -ftest-coverage” $PYTHON setupscons.py scons –static=1
where $PYTHON refers to the python you build from sources. This will build every extension as a static library. To link them to the python binary, I simply added a fake source file and link the numpy as libraries to the fake source in Modules/Setup.local
*static*
multiarray fake.c -L$LIBPATH -lmultiarray -lnpymath
umath fake.c -L$LIBPATH -lumath -lnpymath
_sort fake.c -L$LIBPATH -l_sort -lnpymath
where LIBPATH refers to the path where to find the static numpy libraries (e.g. build/scons/numpy/core in your numpy source tree). To run the testsuite, one has to make sure to import a numpy where multiarray, umath and _sort extensions have been removed, it will crash otherwise (as the extesions would be present twice in the python process, one for the dynamically loaded code, one for the statically linked code). The test suite kind of run (~1500 tests), and on can get code coverage afterwards. For multiarray extension, here is what I get:
File ‘build/scons/numpy/core/src/multiarray/common.c’
Lines executed:52.56% of 293
build/scons/numpy/core/src/multiarray/common.c:creating ‘common.c.gcov’
File ‘build/scons/numpy/core/include/numpy/npy_math.h’
Lines executed:50.00% of 12
build/scons/numpy/core/include/numpy/npy_math.h:creating ‘npy_math.h.gcov’
File ‘build/scons/numpy/core/src/multiarray/arraytypes.c’
Lines executed:62.23% of 1030
build/scons/numpy/core/src/multiarray/arraytypes.c:creating ‘arraytypes.c.gcov’
File ‘build/scons/numpy/core/src/multiarray/hashdescr.c’
Lines executed:68.38% of 117
build/scons/numpy/core/src/multiarray/hashdescr.c:creating ‘hashdescr.c.gcov’
File ‘build/scons/numpy/core/src/multiarray/numpyos.c’
Lines executed:81.48% of 189
build/scons/numpy/core/src/multiarray/numpyos.c:creating ‘numpyos.c.gcov’
File ‘build/scons/numpy/core/src/multiarray/scalarapi.c’
Lines executed:47.43% of 350
build/scons/numpy/core/src/multiarray/scalarapi.c:creating ’scalarapi.c.gcov’
File ‘build/scons/numpy/core/src/multiarray/descriptor.c’
Lines executed:61.96% of 1028
build/scons/numpy/core/src/multiarray/descriptor.c:creating ‘descriptor.c.gcov’
File ‘build/scons/numpy/core/src/multiarray/flagsobject.c’
Lines executed:42.31% of 208
build/scons/numpy/core/src/multiarray/flagsobject.c:creating ‘flagsobject.c.gcov’
File ‘build/scons/numpy/core/src/multiarray/ctors.c’
Lines executed:64.69% of 1583
build/scons/numpy/core/src/multiarray/ctors.c:creating ‘ctors.c.gcov’
File ‘build/scons/numpy/core/src/multiarray/iterators.c’
Lines executed:70.41% of 774
build/scons/numpy/core/src/multiarray/iterators.c:creating ‘iterators.c.gcov’
File ‘build/scons/numpy/core/src/multiarray/mapping.c’
Lines executed:77.95% of 721
build/scons/numpy/core/src/multiarray/mapping.c:creating ‘mapping.c.gcov’
File ‘build/scons/numpy/core/src/multiarray/number.c’
Lines executed:51.80% of 361
build/scons/numpy/core/src/multiarray/number.c:creating ‘number.c.gcov’
File ‘build/scons/numpy/core/src/multiarray/getset.c’
Lines executed:44.09% of 372
build/scons/numpy/core/src/multiarray/getset.c:creating ‘getset.c.gcov’
File ‘build/scons/numpy/core/src/multiarray/sequence.c’
Lines executed:50.00% of 60
build/scons/numpy/core/src/multiarray/sequence.c:creating ’sequence.c.gcov’
File ‘build/scons/numpy/core/src/multiarray/methods.c’
Lines executed:47.35% of 942
build/scons/numpy/core/src/multiarray/methods.c:creating ‘methods.c.gcov’
File ‘build/scons/numpy/core/src/multiarray/convert_datatype.c’
Lines executed:56.11% of 442
build/scons/numpy/core/src/multiarray/convert_datatype.c:creating ‘convert_datatype.c.gcov’
File ‘build/scons/numpy/core/src/multiarray/convert.c’
Lines executed:66.67% of 183
build/scons/numpy/core/src/multiarray/convert.c:creating ‘convert.c.gcov’
File ‘build/scons/numpy/core/src/multiarray/shape.c’
Lines executed:76.81% of 345
build/scons/numpy/core/src/multiarray/shape.c:creating ’shape.c.gcov’
File ‘build/scons/numpy/core/src/multiarray/item_selection.c’
Lines executed:55.07% of 937
build/scons/numpy/core/src/multiarray/item_selection.c:creating ‘item_selection.c.gcov’
File ‘build/scons/numpy/core/src/multiarray/calculation.c’
Lines executed:59.08% of 523
build/scons/numpy/core/src/multiarray/calculation.c:creating ‘calculation.c.gcov’
File ‘build/scons/numpy/core/src/multiarray/usertypes.c’
Lines executed:0.00% of 111
build/scons/numpy/core/src/multiarray/usertypes.c:creating ‘usertypes.c.gcov’
File ‘build/scons/numpy/core/src/multiarray/refcount.c’
Lines executed:66.67% of 129
build/scons/numpy/core/src/multiarray/refcount.c:creating ‘refcount.c.gcov’
File ‘build/scons/numpy/core/src/multiarray/conversion_utils.c’
Lines executed:59.49% of 316
build/scons/numpy/core/src/multiarray/conversion_utils.c:creating ‘conversion_utils.c.gcov’
File ‘build/scons/numpy/core/src/multiarray/buffer.c’
Lines executed:56.00% of 25
build/scons/numpy/core/src/multiarray/buffer.c:creating ‘buffer.c.gcov’
File ‘build/scons/numpy/core/src/multiarray/scalartypes.c’
Lines executed:42.42% of 877
build/scons/numpy/core/src/multiarray/scalartypes.c:creating ’scalartypes.c.gcov’
File ‘build/scons/numpy/core/src/multiarray/ucsnarrow.c’
Lines executed:89.36% of 47
build/scons/numpy/core/src/multiarray/ucsnarrow.c:creating ‘ucsnarrow.c.gcov’
File ‘build/scons/numpy/core/src/multiarray/arrayobject.c’
Lines executed:58.75% of 514
build/scons/numpy/core/src/multiarray/arrayobject.c:creating ‘arrayobject.c.gcov’
File ‘build/scons/numpy/core/src/multiarray/multiarraymodule.c’
Lines executed:49.12% of 1134
build/scons/numpy/core/src/multiarray/multiarraymodule.c:creating ‘multiarraymodule.c.gcov’
The figures themselves are not that meaningful ATM, since the test suite does not run completely, and the built numpy is a quite bastardized version of the real numpy.
The numpy modifications, although small, are very hackish – I just wanted to see if that could work at all. If time permits, I hope to be able to automate most of this, and have a system where it can be integrated in the trunk. I am still not sure about the best way to build the extensions themselves. I can see other solutions, such as producing a single file per extension, with every internal numpy header/source integrated, so that they could be easily build from Setup.local. Or maybe a patch to the python sources so that make in python sources would automatically build numpy.


I'm writing some proposals to expand support for Python infrastructure (think cross-platform build and test farms a la Snakebite) and for the Mellon Foundation application, I'd like to find out how Python is being used in the humanities. I found NLTK, the Natural Language Toolkit; what else is big?
thanks, --titus
After last week book review on Martin Fowler’s Refactoring, I’d like to review another book, more oriented towards patterns and refactoring.
First, this book could be seen as a follow up of Refactoring, as almost all the described processes use steps from it. The first chapters explain what to expect from refactoring (how it affect software architecture), patterns and how to detect code that needs refactoring. The catalog is split like the original Design Patterns book in different categories (Creation, Simplification, Generaliation, Protection, Accumulation or Utilities), and each of them is described as in the GoF book (which suits me quite well).
The code language is Java, so it may be sometimes difficult to find the equivalent construction in your favorite language, but it is doable if you known Java basics.
This book quotes several times Martin Fowler, but this is not related to the fact that this book is in the “Fowler signature” collection of the publisher. Refactoring to Patterns addresses what Refactoring couldn’t and goes firther, without imposing, just by suggesting.
Refactoring to Patterns (Addison-Wesley Signature Series) (Hardcover)
by Joshua Kerievsky
ISBN: 0321213351
Price: USD 49.63
52 used & new available from USD 34.98
| 4 | 44
Gary Ruben just asked me if it was possible to retrieve the triangulation information from my previous Delaunay example. Actually the reason I came up with this example is that Emanuelle Gouillart, my partner[*], needed to do Delaunay triangulation on some data. She was kind enough to extract that code from her code base. Here it is.
[*] The various languages do not seem to have evolved quickly enough to cope with the fact that people can now have a stable long-term relationship with someone you are not married to. What word should I be using here: ‘girlfriend’, ‘partner’… ?
I am pleased to announce that EPD (Enthought Python Distribution) version
4.2.30201 has been released. You may find more information about EPD, as
well as download a 30 day free trial here:
http://www.enthought.com/products/epd.php
You can check out the release notes here:
https://svn.enthought.com/epd/wiki/Py25/4.2.30201/RelNotes
The Enthought Python Distribution (EPD) is a “kitchen-sink-included”
distribution of the Python Programming Language, including over 80
additional tools and libraries. The EPD bundle includes NumPy, SciPy,
IPython, 2D and 3D visualization, database adapters, and a lot of
other tools right out of the box.
http://www.enthought.com/products/epdlibraries.php
It is currently available as a single-click installer for Windows XP (x86),
Mac OS X (a universal binary for OS X 10.4 and above),
RedHat 3, 4 and 5 (x86 and amd64), as well as Solaris 10 (x86).
EPD is free for academic use. An annual subscription including installation
support is available for individual and commercial use. Additional
support options, including customization, bug fixes and training classes
are also available:
http://www.enthought.com/products/support_level_table.php
- Ilan
I’ve read this book when I started my PhD thesis. It helped me laying down the basis of software conception.
It was the first book where I found the code smell concept. And my former code really smelt…
This book became a reference for me. The patterns catalog (because I consider them as patterns) seems almost exhaustive for me. Some of the steps described are too easy (as inlining a function). It may be more difficult at first to extract a method from another, but it is still the basis of refactoring.
The main use of the book isn’t saying what one already know (inling, splitting a function, …), it is showing a new step, one one didn’t think about and that solves the problem at hand. Stating the problem, the smell, is also a main focus of the book, where Martin Fowler gives hints to distinguish between the different smells.
The author is one of the world-known unitary tests herald, so it is no wonder they have a central place in the different patterns. This way, the refactoring cannot change the code behavior. The given examples are also clear and simple enough, completely described step by step (mainly as the book goes forward, new patterns use preceding ones).
Finally, the code is written in Java, but it isn’t a problem for someone used to object-oriented language. Useful tools are also exposed, but they are old, dating from the time when the book was written (and they are mainly geared toward Java).
This was a really simple book. Since I closed it, I didn’t have to open it again, as the patterns are pretty simple. But still, as for every pattern, you first have to understand it (and read it) to acknowledge it. And I tend to refactor regularly following the ideas behind this book.
Refactoring: Improving the Design of Existing Code (Addison-Wesley Object Technology Series) (Hardcover)
by Martin Fowler, Kent Beck, John Brant, William Opdyke, Don Roberts
ISBN: 0201485672
Price: USD 46.79
85 used & new available from USD 30.40
| 4.5 | 138
I was working on a client/server project where we send collections of data across the wire. I needed a method of matching datasets on the client and server, and the python hash function seemed ideal. I suspected that the hash function might have different behaviour on different systems, but conveniently forgot to test it until after I tried to deploy it.
I expected differences, but I didn’t really know to what extent, so I did a little research. So far, ints are the only thing I have found that hash the same, because int’s __hash__ function just returns the int value. Otherwise, Python’s hash functions depend on multiplication using long ints.
While doing my research, I found a page discussing hashing in Python 2.3. The algorithms are similar to the C implementations in Python 2.6.
Of course, I got bit because Python 2.5 on OS X 10.4 and 64bit RedHat 5 didn’t hash my objects the same. In the end, I serialized the data’s metadata and performed a md5 instead, which requres more CPU cycles, but at least it works…
Tonight I sat down and played a bit with VTK’s Delaunay tessalation filter. I wanted to inspect the local structure of a graph created by Delaunay tessalation of random points. To see better the structure, I selected a slab of the resulting unstructured grid. I think the image is not only instructive to explain what a Delaunay tessalation is, it also looks pretty cool. Here is the image and the Mayavi script that creates it.

I missed a date with my wife on Friday to help push the beta release of EPD out for all 10 platforms we are currently supporting (WinXP, WinVista, Mac OS X 10.5(10.4)-intel(ppc), RH3 (x86, amd64), RH5 (x86, amd64)). The 6 different binaries were uploaded to our download servers early Saturday morning (4:00 am Central Time). I’m excited for people to try the new release as it brings together recent NumPy, SciPy, matplotlib, and Ipython together with many additional tools.
One of the things I’m very enthused to have people try is an alpha version of EPDLab which comes in the distribution. EPDLab is an open-source Envisage application which offers an IPython shell along with a linked code editor to allow highly interactive development. EPDLab also contains a “search documentation strings” widget which uses Whoosh and some Robert Kern indexing Fu to provide a very useful search for all of the powerful tools pre-packaged with EPD.
Get the beta2 today and start using a very full-featured distribution of Python across your organization today. Download Beta2.
If you try this recent beta, I’d love to hear from you about any feedback you may have (both positive and negative). Email me at info@enthought.com. The final version of the next release of EPD (4.2.30201) should be out by early next week.
-Travis
I'd like to invite you to attend the last of the Michigan State University CSE colloquia for the 2008-2009 academic year: jointly sponsored as an AT&T Visiting Lecturer by the MSU LCT, and the CSE department, Sam Ramji will speak about
Open Source at Microsoft: The Past, Present and Future
in CommArts room 147, Friday May 1, at 11:00am. I encourage you all to attend and to forward this on to others who might be interested! As you know, open source software is playing an increasingly big part in education, academia, science, and business, and so I expect this to be a very interesting talk.
Contact me at ctb@msu.edu for further information.
--
Abstract:
Since Microsoft established its Open Source Lab in Redmond more than five years ago, it has worked with many open source players to make Windows the best platform for all applications to run on. But this has not been without its challenges and there is a lot more work to be done on this front. This talk will cover the thinking behind Microsoft's current open source strategy and what this means for the software engineers of the future. It will also spotlight some innovative Open Source projects the company is supporting at universities across the world.
Biography:
Sam Ramji is the Senior Director of Platform Strategy leading Microsoft's platform strategy efforts across the company, including long-term strategic planning in the Windows Server and Tools organization. Sam's primary focus is to drive Microsoft's Linux and Open Source Strategy, working together with Microsoft technology development teams and open source communities to build interoperable solutions.
Prior to his current role at Microsoft, Sam was a Director of Emerging Business working on the Silicon Valley Campus where he managed relationships with Venture Capitalists and entrepreneurs. Prior to joining Microsoft, Sam led technical product strategy at BEA Systems, engineering teams building large-scale applications on Open Source software (at Ofoto.com) as well as hands-on development of client, client-server, and distributed applications on Unix, Windows, and Macintosh at prior companies.
Sam holds a Bachelor of Science degree in Cognitive Science from the University of California at San Diego, and is a member of the Institute for Generative Leadership.
Open source coding is like a not-so-demanding mistress: I work on it at night, surreptitiously, after my wife and daughter are asleep. twill and figleaf are like bastard children, who only get attention when I can spare it from my "real" family (my teaching, research or my actual family, depending ;)
Sigh.
--titus
If last week’s book review was too complicated for you, perhaps this book is more suited for you. Less design patterns, but a funnier way to describe them.
Only twelve patterns are explained, but more important is the fact that each of them is detailled, with examples and exercices, as well as important phrases displayed as images. With humor, these are the caracteristics of the “Head First” collection.
Instead of using the same example throughout the book, each chapter has more or less its own concrete example. The latest chapters explain how they can interact to create complex applications, or how they can define new design patterns (as MVC for instance), then how it will change your way of thinking software architecture.
This book focuses on the most used and perhaps useful design patterns. This way, it can present them differently than a simple catalog, but I have to say that this approach will not be suited for everyone. For instance, I appreciate books that go straight to the point.
This kind of book is well suited for people that want to start with design patterns, but not for people familiar with them: they should go for catalogs.
Head First Design Patterns (Paperback)
by Elisabeth Freeman, Eric Freeman, Bert Bates, Kathy Sierra
ISBN: 0596007124
Price: USD 29.67
77 used & new available from USD 19.94
| 4.5 | 273
Anyone out there used disco (http://discoproject.org/)? Comments, good/bad/neutral?
From the page:
Disco is an open-source implementation of the Map-Reduce framework for distributed computing. As the original framework, Disco supports parallel computations over large data sets on unreliable cluster of computers.
The Disco core is written in Erlang, a functional language that is designed for building robust fault-tolerant distributed applications. Users of Disco typically write jobs in Python, which makes it possible to express even complex algorithms or data processing tasks often only in tens of lines of code. This means that you can quickly write scripts to process massive amounts of data.
thanks!
--titus
As I’ve said before, I’ve done several book reviews in the past. I will start with a small serie on design patterns books.
This book is one of the “must-have” in your library. If you write some code or if you manage some IT or Computer Science projects, you will have this book to lay down the basic software architecture.
The first two chapters are an introduction and explain the reasons of the existence of design patterns, how they should be used, good and bad pratices, … Design patterns without rules to apply them are useless (as the original architectural patterns are useless without drawings skills). A practical example is the object of the second chapter.
Design patterns are exposed in a three parts catalog. Each pattern each described by a complete explanation, an UML diagram, the interactions between the pattern elements, as well as some implementation solutions (all solutions cannot be written, as it is language-dependent).
Creational patterns are about creating new objects. They include the abstract factory (constructing several objects of different kinds), the builder (a more elaborate constructor), the factory method (overloading a class method to create objects based on different classes), the prototype (creating new objects by cloning an instance) and the singleton (creating only one instance of a given class).
Structural patterns are more about the actual software architecture. They include the adapter (translating an interface to another one), the bridge (separating an interface from different implementations), the composite (allowing several objects of a hierarchy to be composed together), the decorator (adding characteristics to an object), the facade (offering an interface to several classes), the flyweight (sharing the same objects between instances so as to reduce memory overhead) or the proxy (using another object to access another, potentially hidden, one).
Behavioral patterns enable a software to change its own behavior. They include the chain of responsibility (allowing processing requests by whoever can), the command (creating complex requests), the interpreter (describing how a language can be processed), the iterator (providing a way of browing the content of a data container), the mediator (allowing communication between different classes), the memento (enabling restoring the state of an object), the observer (sometimes also called listener, creating a way for instances to be updated/called by another one), the state (allowing changing behavior on the fly), the strategy (providing several ways of doing something), the template method (providing a skeletton for an algorithm) or the visitor (allowing execution of code for every the content of an object). This patterns set is perhaps the most heterogeneous one (although the state and the strategy are in fact exactly the same, the only difference being the interpretation of their actions).
This book, sometimes refered ad the GoF book, lays down the basis of software conception. These 23 patterns are not the only ones you may use (some of them are also seldom used), but they are used to make the more complicated ones. If you have one design patterns book you should buy, it is this one.
Design Patterns: Elements of Reusable Object-Oriented Software (Addison-Wesley Professional Computing Series) (Hardcover)
by Erich Gamma, Richard Helm, Ralph Johnson, John M. Vlissides
ISBN: 0201633612
Price: USD 38.96
124 used & new available from USD 23.06
| 4.5 | 261
One of the advantages of this year's PyCon was that it was (again) held in Chicago, the home town of Leapfrog Online. Since they use twill quite a bit, and were bothered by some of the poor design decisions and bugginess, they were keen to get together with me to move twill forward. So we scheduled a sprint for the Monday after PyCon.
In preparation for the sprint, I did a bit of research into how widely twill was being used. Downloads only roughly correlate, but I was surprised to discover that in just the last year, there were over 6,000 downloads from my site; this doesn't count Debian users, who can install it from one of the Debian dists. I'd also been surprised by the number of people at PyCon who came up to me and told me that they were using twill internally in their companies -- at least two very expert groups had settled on it for some of their internal monitoring and testing. Very cool! What this told me is that twill is very nice, simple and usable for many people and we shouldn't get too adventuresome; good thing to know ;).
The sprint basically consisted of us talking through a few fundamental issues like bundling and future development, then fixing a few items, while I forwarded on all of the bug reports I've gotten over the last two years.
The source code has now moved to code.google.com/p/twill and you can see all of the issues in the usual place.
During the sprint we made a few decisions:
- 0.9.2 is Coming Real Soon, as a largely feature/API-stable release that fixes a number of simple bugs and integrates the latest mechanize.
- for 0.9.2 and 1.0 we will provide both bundled and unbundled versions of twill; the bundled versions will contain BeautifulSoup, mechanize, ClientForm, and pyparsing. The unbundled version will simply specify what versions of those packages it needs. This unbundling will help packagers out while letting individuals (like, say, Windows users) install twill easily.
- 1.0 is further down the road, but will only add a few features. The main goal of 1.0 is to be nice & stable.
- 2.0 and beyond is on the table but exactly what it will be is unclear. I have my own ideas but since I'm not doing much Web developing I may let others take over.
Since the sprint, Pam Z. finished putting the issues into the tracker and we've been slowly trying to work through them.
Props to Pam Z., Nat W., Kevin B., and Jesse for coming to the sprint, and to Terry Peppers and Leapfrog for pushing it! And thanks to Leapfrog for an excellent steak dinner afterwards ;)
--titus
We’ve had a number of recent internal discussions about EPD during which the phrases “that won’t work on OS X 10.4″ or “does upstream have PPC support?” came up quite often. For example, a recent discussion about the importance of relocatable EPD egg installs sputtered because we realized Mac OS X 10.4 doesn’t support RPATH settings in binary headers, which meant we’d have to do something special just for that platform.
Once we realized this commonality, we next wondered how important OS X 10.4 and PPC support actually is for the EPD user community. Thus the point of this blog post: to get some community input. This is your chance to speak up if you need OS X 10.4 and/or PPC support. I can’t promise that a single ‘yes’ will sway our decision making, but certainly the more people who speak up, the more likely we are to try to continue the support.
For those who don’t know, sys.path is the path that the Python interpreter traverse at each module import to look for the module file imported.
This blog post is about the consequences of having a long sys.path. I’ll try and make it short, but I would have a lot to say. I am just reacting on Noah Gift’s post on performance improvement, not making a full essay on why overloading sys.path is considered harmful.
When using easy_install (or setuptools), each new project is installed in a different directory, and the directory is added at runtime to the sys.path (the addition at runtime confuses many users who are not aware of it). As a result, you quickly end up with more than 40 directory on your sys.path. These directories are ’stat-ed’ one after the other on each module import. Thus if you have a long sys.path, there are a large amount of system calls to read directories. To check this out, simply try:
strace python -c "import foobar" 2>&1 | less
You can see the amount of noise created by a simple (failing) import statement. On a system with high latency (such as an NFS, as we use at work), this is very costly.
Noah joyfully reports performance improvements by hijacking the Python import mechanism. I claim that part of what Noah has done is not really hijacking the import mechanism, it is undoing the hijacking performed by setuptools.
I know I am being rude, but many people raised this point before, and it is not getting any traction from the setuptools maintainer. I claim that you should not be using setuptools or easy_install if you want performance or control. I claim that you should not be using setuptools unless you understand well what you are doing (which defeats the name easy_install).
The way I install packages when I want good control via easy_install is in a virtual environment to discovered the dependencies, and then:
easy_install -Zeab . package_name
to download the package for each required package, and
python setup.py install --single-version-externally-managed –record ./foobar
if the package itself is using setuptools.
As you can see, setuptools make it really hard to do a clean install. Its a design choice :(.
Another alternative is to use pip which I strongly encourage.
Enthought is offering “Introduction to Scientific Computing in Python” at our offices in Austin, Texas from June 15th to June 19th. This course is intended for scientists and engineers who want to learn to use Python for day-to-day computational tasks.
The cost for the course is $2500. Please see the course description on the Enthought website for details.
Space is still available in our course on Python for Science, Engineering, and Financial Analysis, May 18th to 21st, in New York City
Profiling comes in three different flaviors. The first is emulation, where a processor behavior is emulated, the second is sampling, where at regular intervals, the profiler samples the status of a program, and fianlly instrulentation, where the profiler gets information when a subroutine is called and when it returns. As with the Heisenberg uncertainty, profiling changes the exact behavior of your program. This is something you have to remember when analyzing a profile.
Valgrind is an Open Source emulation profiler. It is freely available on standard Linux platforms. As it is an emulation, it is far slower than the actual program. This means that the I/O are underestimated. The advantage is that you can have every detail on the memory behavior (cache misses for instance). Valgrind does not emulate all processors, but you can tweak it to approach your own one.
This is more or less a translation of my French tutorial on Valgrind profiling.
Calling the profiler is really easy:
valgrind --tool=callgrind --dump-instr=yes --simulate-cache=yes --collect-jumps=yes program arguments |
Here, I ask valgrind to use the callgrind profiler plugin, and it is supposed to dump the executed instructions (which will help knowing which part of a function really costs, not only which function), simulate the cache (to help enhancing the processor usage) and collect jumps (to have a dynamic view of the program behavior). Of course, the program must have been compiled with the appropriate compilation options (at least -g).
KCacheGrind is probably the best tool to visualize ad analyze valgrind results (it can also display other profilers results).
When opening a profile, KCacheGrind may not recognize the associated source files. You may add their folder to the annotations folders.

I think the most important graph KCacheGrind provides is the Callee Map. It can be colorized by different means (files, classes, …), the main point being that Callee Map provides an image where the surface of a function represents its weight in the program execution (weight being number of instructions, cache misses, …). Unfortunately it appears that in some cases, KCacheGrind is not able to create everything Callee related. I don’t know why, but I got this on a RedHat 4, the associate KCacheGrind and the latest valgrind.
![]() |
![]() |
Call graphs can also provide intel on how much each function consumes. When double-clicking on a function (in the call graph, in the Callee Map), it is “activated”. The original source code is shown (with jumps, if they were collected) with the cost for each instruction, which functions called the function and which functions are called. Another important thing is the difference between the self cost (sometimes called exclusive cost) and the inclusive cost. The former is the cost of the function alone, the latter is the cost of the function with the cost of the called functions.
![]() |
![]() |
Valgrind combined with KCacheGrind are free tools to make an application profile. It is far from perfect, but it provides valuable information. Instrumentation- and sample-based profiles need a patched kernel (for Linux) or administrator rights (for Windows and Linux), and they can’t provide at the moment every cost, contrary to emulation.
by Ondřej Čertík (noreply@blogger.com) at April 05, 2009 02:22 AM