Planet Python

Last update: March 18, 2015 04:47 AM

March 17, 2015

Zaki Akhmad

Melbourne Django Hack Weekend

Wow, it’s already March 2015 and I haven’t written anything since then here.

On March 9, it was public holiday at Victoria. So it was a long weekend. I received invitation to join melb-django hack weekend. Since I didn’t have any schedule yet, I decided to join and get ready for my first time hack experience.

The hack weekend lasted for two days: on Sunday March 8 and on Monday March 9. Unfortunately I could only joined on Sunday. Actually it will be great if I could join the second day, because on the first day I spent most of my time for setting up the environment.

Since no one had an idea what django project/app to hack, Curtis decided to continue the previous hack project which was django-dequorum. Curtis explained what is django-dequorum and what features which is undone. django-dequorum is a simple forum django application. Users could post threads and other users could make comments. To make things easier what things need to be done, Curtis put them on the github issues. And we agreed that we will use git flow to hack this django-dequorum. Everyone will fork and make pull request to contribute to this project.

Suddenly, it was 12.30 So we decided to have lunch before we hack django-dequorum. We walked to Lenthil, a unique canteen at Abbotsford. They provide vegetarian menu. They didn’t charge on how much we ate. They put poster explained how they run this canteen and how much does it cost. We put the money inside a box.

Walking to Lenthil

After lunch, Nicole took the initiative to do the look and feel. Nicole drew the layout.

Drawing layout

I didn’t join the layout discussion because I was still struggling with the environment setup. At first I setup my environment with python 2.7.x and later I found that they are using python3. So I re-set up my virtualenv.

And then I was confused. I didn’t want git to track changes on the django project directory but I wanted git to track changes on django-dequorum app. I put a symbolic link, still failed. I finally managed with folder symbolic link.

Other problem raised after we forked the github repo. How to keep our forked repo updated with funkybob repo? Since I had this problem before so I already knew the answer. I helped my friends on how to add another git remote repository.

I still didn’t make any commits to this open source project, but hopefully I could contribute in the future. It was a nice hack weekend with friendly new friends.

March 17, 2015 09:47 PM

Europython

EuroPython 2015: Call for Proposals

We’re looking for proposals on every aspect of Python: programming from novice to advanced levels, applications and frameworks, or how you have been involved in introducing Python into your organization. EuroPython is a community conference and we are eager to hear about your experience.

Please also forward this Call for Proposals to anyone that you feel may be interested.

Submit your proposal!

Submissions will be open from Monday, March 16, until Tuesday, April 14.

Presenting at EuroPython

We will accept a broad range of presentations, from reports on academic and commercial projects to tutorials and case studies. As long as the presentation is interesting and potentially useful to the Python community, it will be considered for inclusion in the program.

Can you show something new and useful? Can you show the attendees how to: use a module? Explore a Python language feature? Package an application? If so, please consider submitting a talk.

First time speakers are especially welcome.

There are four different kinds of contributions that you can present at EuroPython:

Regular talk / 170 slots.

These are standard “talks with slides”, allocated in slots of
- 30 minutes (80 slots)
- 45 minutes (85 slots)
- 60 minutes (5 slots)
depending on your preference and scheduling constraints. A Q&A session is held at the end of the talk and included in the time slot.
Hands-on training / 20 slots.

These are advanced training sessions to dive into the subject with all details. These sessions are 2.5 - 3 hours long. The training attendees will be encouraged to bring a laptop. They should be prepared with less slides and more source code. Room capacity for the two trainings rooms is 70 and 180 seats.
Posters / 25 slots

Posters are a graphical way to describe a project or a technology, printed in large formats; posters are exhibited at the conference, can be read at any time by participants, and can be discussed face to face with their authors during the poster session.
Helpdesk / 5 slots

Helpdesks are a great way to share your experience on a technology, by offering to help people answering their questions and solving their practical problems. You can run a helpdesk by yourself or with colleagues and friends. People looking for help will sign up for a 30 minute slot, get there and talk to you. There is no specific preparation needed; you just need to be proficient in the technology you run the helpdesk for.

Discounts for speakers and trainers

Since EuroPython is a not-for-profit community conference, it is not possible to pay out rewards for talks or trainings. Speakers of regular talks will instead have a special 25% discount on the conference ticket, trainings get a 100% discount to compensate for the longer preparation time. Please note that we can not give discounts to submitters of posters or helpdesk proposals.

Topics and Goals

Suggested topics for EuroPython presentations include, but are not limited to:

Core Python
Alternative Python implementations: e.g. Jython, IronPython, PyPy, and Stackless
Python libraries and extensions
Python 2 to 3 migration
Databases
Documentation
GUI Programming
Game Programming
Network Programming
Open Source Python projects
Packaging Issues
Programming Tools
Project Best Practices
Embedding and Extending
Education, Science and Math
Web-based Systems

Presentation goals are usually some of the following:

Introduce the audience to a new topic
Introduce the audience to new developments on a well-known topic
Show the audience real-world usage scenarios for a specific topic (case study)
Dig into advanced and relatively-unknown details on a topic
Compare different solutions available on the market for a topic

Language for Talks & Trainings

Talks and training should, in general, be held in English.

However, since EuroPython is hosted in Bilbao and EuroPython has traditionally always been very open to the local Python communities, we are also accepting a number of talks and trainings in Spanish and Basque.

The talk submission form lets you choose the language you want to give the talk in.

If you speak Basque/Spanish and don’t feel comfortable speaking English, please submit the talk title and abstract directly in Spanish/Basque. If you are able to give the talk in multiple languages, please submit one proposals for the talk in each language, with title and description adjusted accordingly.

Inappropriate Language and Imagery

Please consider that EuroPython is a conference with an audience from a broad geographical area which spans countries and regions with vastly different cultures. What might be considered a “funny, inoffensive joke” in a region might be really offensive (if not even unlawful) in another. If you want to add humor, references and images to your talk, avoid any choice that might be offensive to a group which is different from yours, and pay attention to our EuroPython Code of Conduct.

Community Based Talk Voting

Attendees who have bought a ticket in time for the Talk Voting period gain the right to vote for talks submitted during the Call For Proposals.

The Program WG will also set aside a number of slots which they will then select based on other criteria to e.g. increase diversity or give a chance to less mainstream topics.

Release agreement for submissions

All submissions will be made public during the community talk voting, to allow all registrants to discuss the proposals. After finalizing the schedule, talks that are not accepted will be removed from the public website. Accepted submissions will stay online for the foreseeable future.

We also ask all speakers to:

accept the video recording of their presentation
upload their talk materials to the EuroPython website
accept the EuroPython Speaker Release Agreement which allows the EPS to make the talk recordings and uploaded materials available under a CC BY-NC-SA license

Talk slides will be made available on the EuroPython web site. Talk video recordings will be uploaded to the EuroPython YouTube channel and archived on archive.org.

For more privacy related information, please consult our privacy policy.

Contact

For further questions, feel free to contact our helpdesk@europython.eu

March 17, 2015 03:18 PM

Tennessee Leeuwenburg

The basic model for this blog

How Things Will Proceed

Hi there! I've been busy since the last post. I've been thinking mainly about the following areas:

Picture of a scam drug that's guaranteed to work instantly

Source: Wikipedia article on Placebo
drugs, license: public domain

What differentiates this blog for my readers?
What is the best way, for me, of developing my knowledge and master of these techniques?
What pathway is also going to work for people who are either reading casually, or interested in working through problems at a similar pace?
Preparing examples and potential future blog posts...

I think I have zeroed into something that is workable. I believe in an integrative approach to learning -- namely that incorporating information from multiple disparate areas results in insights and information which aren't possible when considering only a niche viewpoint. At the same time, I also believe it's essentially impossible to effectively learn from a ground-up, broad-base theoretical presentation of concepts. The path to broad knowledge is to start somewhere accessible, and then fold in additional elements from alternative areas.

I will, therefore, start where I already am: applying machine learning for categorisation of images. At some point, other areas will be examined, such as language processing, game playing, search and prediction. However, for now, I'm going to "focus". That's in inverted commas (quotes) because it's still an incredibly broad area for study.

The starting point for most machine learning exercises is with the data. I'm going to explain the data sets that you'll need to follow along. All of these should be readily downloadable, although some are very large. I would consider purchasing a dedicated external drive for this if you have the space, disk space requirements may reach several hundred gigabytes, particularly if you want to store your intermediate results.

The data sets you will want are:

The MNIST databast. It's included in this code repository which we will also be referring to later when looking at deep learning / neural networks: https://github.com/mnielsen/neural-networks-and-deep-learning
The Kaggle "National Data Science Bowl" dataset: http://www.kaggle.com/c/datasciencebowl
The Kaggle "Diabetic Retinopathy" dataset: http://www.kaggle.com/c/diabetic-retinopathy-detection
Maybe also try a custom image-based data set of your own choosing. It's important to pick something which isn't already covered by existing tutorials, so that you are effectively forced into the process of experimentation with alternative techniques, but which can be considered a categorisation problem so that similar approaches should be effective. You don't need to do this, but it's a fun idea. You could use an export of your photo album, the results of google image searches or another dataset you create yourself. Put each class of images into its own subdirectory on disk.

For downloading data, I recommend Firefox over Chrome, since it is much more capable at resuming interrupted downloads. Many of these files are large, and you may genuinely have trouble. Pay attention to your internet plan's download limits if you have only a basic plan.

The next post will cover the technology setup I am using, including my choice of programming language, libraries and hardware. Experienced Python developers will be able to go through this very fast, but modern hardware does have limitations when applying machine learning algorithms, and it is useful to understand what those are at the outset.

Following that will be the first in a series of practical exercises aimed to obtain a basic ability to deploy common algorithms on image-based problems. We will start by applying multiple approaches to the MNIST dataset, which is the easiest starting point. The processing requirements are relatively low, as are the data volumes. Existing tutorials exist online for solving this problem. This is particularly useful to start with, since it gives you ready-made benchmarks for comparison, and also allows easy cross-comparison of techniques.

I'd really like it if readers could reply with their own experiences along the way. Try downloading the data sets -- let me know how you go! I'll help if I can. I expect that things will get more interesting when we come to sharing the experimental results.

Happy coding,

-Tennessee

March 17, 2015 01:38 PM

Machinalis

Real-time notifications on Django using gevent-socketio and RabbitMQ

Motivation

A few months ago we posted a similar article that presented a way to implement real-time notifications on Django using Node.js, socket.io and Redis. It got quite a few comments asking us why we used Node.js instead of a gevent-based solution. Our response was that we had hands-on experience with the Node.js solution, and that we would try to write another article about a gevent-based solution in the future. This is that article.

This time we’ll be replacing Node.js with a 100% Python implementation using gevent-socketio and Redis with RabbitMQ, but we also didn’t want to bore you with the same vanilla notifications site, so we’re going to build something different. Something useful.

This time we’re going to build a complete GeoDjango-based site to report geo-located incidents in real-time using Google Maps.

The Application

The application is a Django 1.7 site that uses GeoDjango (backed by PostGIS) to track and report in real-time geo-located incidents that occur in certain areas of interest around the world. It provides views to manage incidents and areas of interest, a view to monitor the occurrence of incidents in real-time and a view to report incidents that uses geolocator to detect the user’s location.

Whenever an incident is saved (or updated), a message is sent to a RabbitMQ broadcast queue. At this time, the system checks whether the incident occurred in an area of interest, and a special alert message is sent if necessary. Any subscriber to the queue (which are created when a client connects to the notifications socket.io namespace) will receive the message and send a packet down the socket’s channel. It is up the the client’s JavaScript code to update the maps and generate notifications and alerts if necessary.

Although simple, the site has all the basic functionality and can be used as a basis for similar projects. The source is available on GitHub.

The model

To represent the incidents and the areas of interest, we’re going to use the following model:

from django.contrib.gis.db import models


class Incident(models.Model):
    objects = models.GeoManager()

    URGENT = 'UR'
    HIGH = 'HI'
    MEDIUM = 'ME'
    LOW = 'LO'
    INFO = 'IN'

    SEVERITY_CHOICES = (
        (URGENT, 'Urgent'),
        (HIGH, 'High'),
        (MEDIUM, 'Medium'),
        (LOW, 'Low'),
        (INFO, 'Info'),
    )

    name = models.CharField(max_length=150)
    description = models.TextField(max_length=1000)
    severity = models.CharField(max_length=2, choices=SEVERITY_CHOICES, default=MEDIUM)
    closed = models.BooleanField(default=False)
    location = models.PointField()
    created = models.DateTimeField(editable=False, auto_now_add=True)


class AreaOfInterest(models.Model):
    objects = models.GeoManager()

    name = models.CharField(max_length=150)
    severity = models.CharField(max_length=2, choices=Incident.SEVERITY_CHOICES, default=Incident.MEDIUM)
    polygon = models.PolygonField()

The Incident class represents the occurrence of an event around a specific geographic point, specified by the location field. The AreaOfInterest is used to define a region for which the user is going to be alerted if an incident is reported within it. The polygon field specifies the geographic area.

The alerts are going to be sent only when the incident’s severity is above the area’s target severity, and location is within the area’s polygon. This is done using spatial QuerySets within the Incident post_save signal handler:

areas_of_interest = [
    area_of_interest.geojson_feature for area_of_interest in AreaOfInterest.objects.filter(
        polygon__contains=kwargs['instance'].location,
        severity__in=kwargs['instance'].alert_severities,
    )
]

Sending notifications

Once a notification has been constructed, we connect to a RabbitMQ broadcast queue (using Kombu) and publish the notification:

def send_notification(notification):
    with BrokerConnection(settings.AMPQ_URL) as connection:
        with producers[connection].acquire(block=True) as producer:
            maybe_declare(notifications_exchange, producer.channel)
            producer.publish(
                notification,
                exchange='notifications',
                routing_key='notifications'
            )

When a user accesses the site’s home view, it connects to a socket.io namespace:

from django.conf import settings
from kombu import BrokerConnection
from kombu.mixins import ConsumerMixin
from socketio.namespace import BaseNamespace
from socketio.sdjango import namespace

from .queues import notifications_queue


@namespace('/notifications')
class NotificationsNamespace(BaseNamespace):
    def __init__(self, *args, **kwargs):
        super(NotificationsNamespace, self).__init__(*args, **kwargs)

    def get_initial_acl(self):
        return ['recv_connect']

    def recv_connect(self):
        if self.request.user.is_authenticated():
            self.lift_acl_restrictions()
            self.spawn(self._dispatch)
        else:
            self.disconnect(silent=True)

    def _dispatch(self):
        with BrokerConnection(settings.AMPQ_URL) as connection:
            NotificationsConsumer(connection, self.socket, self.ns_name).run()

When a connection is established (and authentication is verified), a new greenlet is spawned, passing the control to a NotificationConsumer instance:

class NotificationsConsumer(ConsumerMixin):
    def __init__(self, connection, socket, ns_name):
        self.connection = connection
        self.socket = socket
        self.ns_name = ns_name

    def get_consumers(self, Consumer, channel):
        return [Consumer(queues=[notifications_queue], callbacks=[self.process_notification])]

    def process_notification(self, body, message):
        self.socket.send_packet(dict(
            type='event',
            name='notification',
            args=(body,),
            endpoint=self.ns_name
        ))
        message.ack()

Each message sent to the broadcast queue is handled by the callback process_notification which send a new packet down the socket’s channel with the body of the notification object.

/static/media/uploads/home_screenshot.png

The image above is a screenshot of the site’s home page while an alert is being shown to the user. The client side of the communication is quite simple:

var socket = io.connect(
    "/notifications",
    {
        "reconnectionDelay": 5000,
        "timeout": 10000,
        "resource": "socket.io"
    }
);

socket.on('connect', function(){
    console.log('connect', socket);
});

The client connects to the socket, and hooks the appropriate callbacks. Socket.io hides away the complexities of choosing a transport layer and handling retries and reconnects. The notification callback handles most of the client’s logic.

socket.on('notification', function(notification){
    console.log('notification', notification);

    if (notification.type === "post_save") {
        if (notification.created) {
            map.data.addGeoJson(notification.feature);
        } else {
            var feature = map.data.getFeatureById(notification.feature.id);
            map.data.remove(feature);

            if (! notification.feature.properties.closed) {
                map.data.addGeoJson(notification.feature);
            }
        }
    } else if (notification.type === "post_delete") {
        var feature = map.data.getFeatureById(notification.feature.id);
        map.data.remove(feature);
    } else if (notification.type === "alert") {
        showAlert(buildAlertModalBodyHtml(notification))
    } else {
        console.log(notification);
    }
});
socket.on('disconnect', function(){
    console.log('disconnect', socket);
});

Upon receiving a notification we make use of Google Maps Data Layer API to draw onto the map. Notice that all we had to do is just given the GeoJSON representation of the objects (which is generated by GeoDjango) to the map, and the rest is take care for us.

Managing events and areas of interest

The site provides views to manage incidents and areas of interest that use Google Maps JavaScript API to manipulate the objects graphically within maps. GeoJson format is supported both by GeoDjango and Google Maps, so we use it as the exchange format in the forms.

/static/media/uploads/report_incident_screenshot.png

We also provide a simple incident report view (depicted above) that uses the geolocator JavaScript library to detect the user’s location.

Conclusions

Although the exercise proved to be really interesting (specially the features related to the spatial features), we really didn’t find any significant advantages over our previous solution based on Node.js. In fact, we had to tackle several complications related to the restrictions that gevent places on the packages that can or can’t be used. First of all, we had to make sure that the libraries that work on the greenlets were either gevent specific, or monkey patching compatible (kombu is). We also had problems running the site using Gunicorn so we had to switch to Chaussette. There was also the matter of gevent-socketio only supporting version 0.9 of the socket.io protocol (hence the bower dependency that points to the 0.9 branch of the client repo).

We hope that you find the information presented in this post useful. As usual, feel free to leave comments or suggestions on how to improve the solution.

Acknowledgements

The bulk of the notifications architecture is based on the solution presented by Jeremy West on his blogpost Django, Gevent, and Socket.io. His tutorial is a great way to understand how gevent-socketio works and how to integrate it into Django.

March 17, 2015 01:33 PM

PyCon

More Sponsor Workshops announced!

As we previously wrote, signup for our free Sponsor Workshops is open and the schedule has now been completed! While registration isn't required, it helps us plan for room sizes and for drinks and snacks, so head to Eventbrite and choose as many as you want!

Wednesday morning gets under way at 9 AM with a team from Elastic taking attendees through the popular Elasticsearch distributed search engine. Honza Král will introduce the various Python clients for working with Elasticsearch, and will be joined by Logstash developer Pier-Hughes and Peter from their solutions engineering team. The full description is available at https://us.pycon.org/2015/schedule/presentation/475/.

The 3:30 PM Wednesday slot features Mark Lavin, Caleb Smith, and David Ray of Caktus Group taking the stage to share their knowledge of RapidSMS and Django. We previously wrote about how they've used SMS while building a voter registration system in Libya, so come see first hand how they do it. The talk is beginner friendly so bring a laptop to check out the code and follow along.

The last slot on Thursday, running from 3:30 to 5:00, will be a trio of talks from Google. Brian Dorsey will be on hand to show how Kubernetes can scale up your usage of Docker, complete with a live demo (he gives great demos btw). The second talk will be on CoLaboratory by Jeff Snyder, covering the project, its integration with Google Drive, and further integrations with IPython and now the Jupyter project. Finally, Alex Perry will cover the use of Python decorators within monitoring pipelines to deliver positive value with minimal impact.

Be sure to sign up today!

March 17, 2015 12:48 PM

Python Piedmont Triad User Group

PYPTUG Project night - Team Near Space Circus

To Space And Back

Another project night where we will focus on our HAB project: Sending a technical payload into space, and back, as part of the 2015 Global Space Balloon Challenge (http://http://balloonchallenge.org/). The project will include a payload that will pay homage to the first NASA balloon flights in 1969 designed to take large area photographs of the earth from a very high altitude.

The payload will include a computer with operating system, many python scripts and various hardware including sensors, transmitters and other tech gear.

The monthly project nights until April will focus on building a high altitude balloon to send into near space. There is something to do for everyone, from art, to programming, to mechanical and electrical engineering, to finding stuff, reading regulations, making recovery plans, buying stuff, coming up with a team name, what experiments should be included in the payload etc. Don't wait for a direct invitation, sign up on our meetup group:

http://www.meetup.com/PYthon-Piedmont-Triad-User-Group-PYPTUG/events/220377127/

This meeting will be on Wednesday, Mar. 18 at 6pm in the Dash room at Inmar:

Inmar

635 Vine St,
Room 1130H "Dash"
Winston-Salem, NC

This will be at the Inmar building in downtown Winston Salem.

Some preliminary work has already started and discussion is ongoing on the PYPTUG mailing list:

https://groups.google.com/forum/#!forum/pyptug

And look for the Near Space Technical Payload Official Thread (should be at the top)

Keep an eye on this site for progress reports. At launch, you will be able to track the actual balloon through a web page.

Note: this is tomorrow Wednesday the 18th. Come by and learn how to network multiple Raspberry Pi model A+ without ethernet...

March 17, 2015 12:20 PM

Nicola Iarocci

New Releases for Cerberus and Eve

Yesterday Cerberus 0.8.1 was released with a few little fixes, one of them being more a new feature than a fix really: sub-document fields can now be set as field dependencies by using a ‘dotted’ notation. So, suppose we set the following validation schema: schema = { 'test_field': { 'dependencies': [ 'a_dict.foo', 'a_dict.bar' ] }, […]

March 17, 2015 09:16 AM

Python Anywhere

Today's maintenance upgrade: Fileserver migration complete, other updates

Morning all!

XFS -> ext4

So the reason for our extra-long maintenance window this morning was primarily a migration from XFS to ext4 as our filesystem for user storage. We'll write more about the whys and wherefores of this later, but the short version is that the main reason for using XFS, project quotas, were no longer needed, and a bug in the version of XFS support by Ubuntu LTS left us vulnerable to long periods of downtime after unplanned reboots, while XFS did some unnecessary quotachecks. The switch to ext4 removes that risk, and has simplified some of our code too, bonus!

In other news, we've managed to squeeze in a few more user-visible improvements :)

Features bump for paid plans

We've decided to tweak the pricing and accounts pages so that all plans are customisable. As a bonus side-effect, we've slightly improved all the existing paid plans, so our beloved customers are going to get some free stuff:

All Hacker plans now allow you to replace your .pythonanywhere.com domain with a custom one
We've bumped the disk space for Hacker plans from 512MB to 2Gigs
And we've bumped the Web Developer CPU quota from 3000 to 4000 seconds

Package installs

bottlenose, python-amazon-simple-product-api, py-bcrypt, Flask-Bcrypt, flask-restful, markdown (for Python 3), wheezy.template, pydub, and simpy (for Python 3) are now part of our standard batteries included

Pip wheels available

We've re-written our server build scripts to use wheels, and to build them for each package we install. We've made them available (at /usr/share/pip-wheels), and we've added them to the PythonAnywhere default pip config. So, if you're installing things into a virtualenv, if it so happens we already have a wheel for the package you want, pip will find it and the install will complete much faster.

Python 3 is now the default for save + run

The "Save and Run" button at the top of the editor, much beloved of teachers and beginners (and highly relevant for our education beta) now defaults to Python 3. It's 2015, this is the future after all. We didn't want to break things for existing users, so they will still have 2 as the default, but we can change that for you if you want. Just drop us a line to support@pythonanywhere.com

Security and performance improvements

Other than that, we've added a few minor security and performance tweaks.

Onwards and upwards!

March 17, 2015 07:42 AM

Python Software Foundation

Manuel Kaufmann and Python in Argentina

Several recent blog posts have focused on Python-related and PSF-funded activities in Africa and the Middle East. But the Python community is truly global, and it has been exciting to witness its continued growth. New groups of people are being introduced to Python and to programming so frequently that it’s difficult to keep up with the news. Not only that, but the scope and lasting impact of work being accomplished by Pythonistas with very modest financial assistance from the PSF is astonishing.

One example is the recent work in South America by Manuel Kaufmannn. Manuel’s project is to promote the use of Python “to solve daily issues for common users." His choice of Python as the best language to achieve this end is due to his commitment to "the Software Libre philosophy,” in particular, collaboration rather than competition, as well as Python's ability "to develop powerful and complex software in an easy way."

Toward this end, one year ago, Manuel began his own project, spending his own money and giving his own time, traveling to various South American cities by car (again, his own), organizing meet-ups, tutorials, sprints, and other events to spread the word about Python and its potential to solve everyday problems (see Argentina en Python).

This definitely got the PSF's attention, so in January 2015, the PSF awarded him a $3,000 (USD) grant. With this award, Manuel has been able to continue his work, conducting events that have established new groups that are currently expanding further. This ripple effect of a small investment is something that the PSF has seen over and over again.

On January 17, Resistencia, Argentina was the setting for its first-ever Python Sprint. It was a fairly low-key affair, held at a pub/restaurant “with good internet access.” There were approximately 20 attendees (including 4 young women), who were for the most part beginners. After a general introduction, they broke into 2 work groups, with Manuel leading the beginners' group (see Resistencia, Chaco Sprint), by guiding them through some introductory materials and tutorials (e.g., Learning Python from PyAr's wiki).

Foto grupal con todos los asistentes (group photo of all attendees).
Photo credit: Manuel Kaufmann

As can happen, momentum built, and the Sprint was followed by a Meet-up on January 30 to consolidate gains and to begin to build a local community. The Meet-up's group of 15 spent the time exploring the capabilities of Python, Brython, Javascript, Django, PHP, OpenStreet Map, and more, in relation to needed projects, and a new Python community was born (see Meetup at Resistencia, Chaco).

The next event in Argentina, the province of Formosa's first official Python gathering, was held on February 14. According to Manuel, it was a great success, attended by around 50 people. The day was structured to have more time for free discussion, which allowed for more interaction and exchange of ideas. In Manuel’s opinion, this structure really helped to forge and strengthen the community. The explicit focus on real world applications, with discussion of a Python/Django software application developed for and currently in use at Formosa’s Tourist Information Office, was especially compelling and of great interest to the attendees. See PyDay Formosa and for pictures, see PyDay Pics.

It looks as though these successes are just the beginning: Manuel has many more events scheduled:

28 Mar - PyDay at Asunción (Gran Asunción, Paraguay and PyDay Asuncion); Manuel reports that registration for this event has already exceeded 100 people, after only 3 days of opening. In addition, the event organizers are working to establish a permanent “Python Paraguay” community!
7 May - PyDay at Apóstoles, Misiones;
20-22 May - Educational Track for secondary students at SciPy LA 2015, Posadas, Misiones, Argentina (SciPy LA and Educational Track); and
30 May - PyDay at Encarnación, Itapúa, Paraguay.

You can learn more and follow Manuel’s project at the links provided and at Twitter. And stay tuned to this blog, because I plan to cover more of his exciting journey to bring Python, open source, and coding empowerment to many more South Americans.

I would love to hear from readers. Please send feedback, comments, or blog ideas to me at msushi@gnosis.cx.

March 17, 2015 12:51 AM

March 16, 2015

Import Python

Free Community Run Python Job Board

Unofficial Python Job Board is a 100% free and community run job board. To add a job vancancy/posting simply send a pull request ( yes you read that correctly ) to Python Job Repository

Submitting a Job Vacancy/Posting/Ad

The jobs board is generated automatically from the git repository, and hosted using github pages.

All adverts are held as markdown files, with an added header, under the jobs/ directory. Jobs files should look something like this:

---
title: <Job Advert Title (required)>
company: <Your Company (required)>
url: <Link to your site/a job spec (optional)>
location: <where is the job based? >
contract: permanent (or contract/temporary/part-time ..)
contact:
    name: <Your name (required)>
    email: <Email address applicants should submit to (required)>
    phone: <Phone number (optional)
    ...: ...
created: !!timestamp '2015-02-20' <- The date the job was submitted
tags:
  - london
  - python
  - sql
---

Full job description here, in Markdown format

To add your job, submit a pull request to this repo, that adds a single file to the jobs/ directory. This file should match the example above.

When each pull request is submitted, it gets validated, and then manually reviewed, before being added to the site. If the pull request fails the validation testing (Travis) then you must fix this before the pull request can proceed.

Previewing your submission

To preview your submission before creating a Review-request, there are a number of steps to follow:

Install hyde - hyde.github.io <code>pip install hyde</code>
Install fin - <code>pip install fin</code>
clone/checkout the https://github.com/pythonjobs/template repository
Within this clone, put your new file in <code>hyde/content/jobs/[job_filename].html
Delete the contents of the <code>deploy</code> directory.
from within <code>hyde/</code>, run <code>hyde serve</code>
Open a web browser, and navigate to http://localhost:8080/

March 16, 2015 01:39 PM

Mike Driscoll

PyDev of the Week: Eli Bendersky

This week, we welcome Eli Bendersky (@elibendersky) as our PyDev of the Week. I have enjoyed reading his blog over the years as he writes some pretty interesting articles on Python. You can see some of the projects he works on at github. Let’s spend a few minutes getting to know our fellow Pythoneer!

Can you tell us a little about yourself (hobbies, education, etc):

I hold a B.Sc in Electrical Engineering, and have been employed in both hardware and software engineering positions over the years. In the past few years I mostly gravitated towards system programming, infrastructure and tooling – working on things like compilers, debuggers and other low-level stuff.

As for hobbies, I guess kids count? That’s definitely what takes most of my off-work time nowadays Other than family, I occasionally manage to carve out some free time for reading, exercising and self-education on topics ranging from programming and math to biology. I use my blog (http://eli.thegreenplace.net) as an outlet to document things I learned that I found most interesting.

Why did you start using Python?

I’m fairly late to the game, actually. In addition to C and C++, my main work hammer was Perl for the first part of my career. Eventually I became disillusioned with it, and after a brief fling with Ruby, ended up in Python-land in 2008. I haven’t looked back since. I’ll be forever thankful to Python for igniting my love for programming and open-source on a whole new level. Switching to Python turned out to be a great decision, given how much momentum the language has gained since 2008.

What other programming languages do you know and which is your favorite?

So I mentioned C and C++. C++ has been mostly paying the bills for me in the last few years, but Python is always in the picture. Other than that, I used to know Perl pretty well, dabbled with Ruby, Common Lisp and Scheme. I can find my way around Javascript most days. A bunch of assembly languages (from the more standard x86 to esoteric things like various DSPs and microcontrollers). Over the years I’ve written bits and pieces of code in Ada, Java and Matlab. There’s also the list of “languages to look at in the future”, right now it includes things like Go and Erlang. My favorite is Python, though. It’s always the first tool I reach for.

What projects are you working on now?

At work I’m hacking on all kinds of internal stuff I can’t talk much about, but some of it percolates upstream into the LLVM (compiler infrastructure) and Clang (C++ front-end) open-source projects. I have a bunch of small open-source projects I have authored and now maintain (mostly Python packages) – my Github account (https://github.com/eliben) has all the details. As a core Python developer my activity comes in short and rare bursts, unfortunately. And there’s my blog, which is a kind of an ongoing project, I guess.

Which Python libraries are your favorite (core or 3rd party)?

I’m a big believer in keeping dependencies small, and developing the core parts of a project yourself. Therefore, my favorite Python libraries are the included batteries – the stdlib. Often folks look for 3rd party libraries for things that are sufficiently served by stdlib modules. I think there’s great value in sticking to the core as much as possible, because it’s part of a common language all Python programmers speak.

The Python ecosystem is powerful, though, and some 3rd party “libraries” are complete frameworks with a bunch of subsystems of their own. I really like using Django for web apps, for example – I’d see no reason to develop a web framework of my own, or to use any of the infinitude of “micro-frameworks” popping up. Django is so well entrenched, it’s a common idiom and language many programmers understand – it has an ecosystem of its own. Another example is the excellent scientific stack Python has – Numpy, Scipy, matplotlib, etc. I’m really excited about the central place Python taking in the world of “big data” thanks to these technologies.

Is there anything else you’d like to say?

Python presents an interesting challenge to programmers. Used correctly, it’s easy to write extremely readable and maintainable Python code. This is an almost unique quality of Python among the modern programming languages.

But the language is powerful, and with some creativity you can create unintelligible monstrosities that are definitely clever, but not very collaboration-friendly. Stick to the simple things as much as possible; if you find you really need to use some metaclass magic or something similarly advanced, encapsulate it well and hide it from most of the code. And don’t forget to document it very well. So stick to the KISS principle, basically.

Thanks so much!

The Last 10 PyDevs of the Week

March 16, 2015 12:30 PM

Caktus Consulting Group

Why RapidSMS for SMS Application Development

Caktus has been involved in quite a few projects (Libyan voter registration, UNICEF Project Mwana, and several others) that include text messaging (a.k.a. Short Message Service, or SMS), and we always use RapidSMS as one of our tools. We've also invested our own resources in supporting and extending RapidSMS.

There are other options; why do we consistently choose RapidSMS?

What is RapidSMS

First, what is RapidSMS? It's an open source package of useful tools that extend the Django web development framework to support processing text messages. It includes:

A framework for writing code to be invoked when a text message is received and respond to it
A set of backends - pluggable code modules that can interface to various ways of connecting your Django program to the phone network to pass text messages back and forth
Sample applications
Documentation

The backends are required because unlike email, there's no universal standard for sending and receiving text messages over the Internet. Often we get access to the messages via a third party vendor, like Twilio or Tropo, that provides a proprietary interface. RapidSMS isolates us from the differences among vendors.

RapidSMS is open source, under the BSD license, with UNICEF acting as holder of the contributors' agreements (granting a license for RapidSMS to use and distribute their contributions). See the RapidSMS license for more about this.

Alternatives

Here are some of the alternatives we might have chosen:

Writing from scratch: starting each project new and building the infrastructure to handle text messages again
Writing to a particular vendor's API: writing code that sends and receives text messages using the programming interface provided by one of the online vendors that provide that service, then building applications around that
Other frameworks

Why RapidSMS

Why did we choose RapidSMS?

RapidSMS builds on Django, our favorite web development framework.
RapidSMS is at the right level for us. It provides components that we can use to build our own applications the way we need to, and the flexibility to customize its behavior.
RapidSMS is open source, under the BSD license. There are no issues with our use of it, and we are free to extend it when we need to for a particular project. We then have the opportunity to contribute our changes back to the RapidSMS community.
RapidSMS is vendor-neutral. We can build our applications without being tied to any particular vendor of text messaging services. That's good for multiple reasons:
We don't have to pick a vendor before we can start.
We could change vendors in the future without having to rewrite the applications.
We can deploy applications to different countries that might not have any common vendor for messaging services.

It's worth noting that using RapidSMS doesn't even require using an Internet text messaging vendor. We can use other open source applications like Vumi or Kannel as a gateway to provide us with even more options:

use hardware called a "cellular/GSM modem" (basically a cell phone with a connection to a computer instead of a screen)
interface directly to a phone company's own servers over the Internet, using several widely used protocols

Summary

RapidSMS is a good fit for us at Caktus, it adds a lot to our projects, and we've been pleased to be able to contribute back to it.

Caktus will be leading a workshop on building RapidSMS applications during PyCon 2015 on Tuesday, April 7th 3:00-5:30.

March 16, 2015 12:00 PM

Luke Plant

My approach to Class Based Views

I've written in the past about my dislike for Django's Class Based Views. Django's CBVs add a lot of complexity and verbosity, and simply get in the way of some moderately common patterns (e.g. when you have two forms in a single view). It seems I'm not alone as a Django core dev who thinks that way.

In this post, however, I'll write about a different approach that I took in one project, which can be summed up like this:

Write your own base class.

For really simple model views, Django's own CBVs can be a time saver. For anything more complex, you will run into difficulties, and will need some heavy documentation at the very least.

One solution is to use a simplified re-implementation of Class Based Views. My own approach is to go even further and start from nothing, writing your own base class, while borrowing the best ideas and incorporating only what you need.

Steal the good ideas

The as_view method provided by the Django's View class is a great idea — while it may not be obvious, it was hammered out after a lot of discussion as a way to help promote request isolation by creating a new instance of the class to handle every new request. So I'll happily steal that!

Reject the bad

Personally I dislike the dispatch method with its assumption that handling of GET and POST is going to be completely different, when often they can overlap a lot (especially for typical form handling). It has even introduced bugs for me where a view rejected POST requests, when what it needed to do was just ignore the POST data, which required extra code!

So I replaced that with a simple handle function that you have to implement to do any logic.

I also don't like the way that template names are automatically built from model names etc. — this is convention over configuration, and it makes life unnecessarily hard for a maintenance programmer who greps to find out where a template is used. If that kind of logic is used, you just Have To Know where to look to see if a template is used at all and how it is used. So that is going.

Flatten the stack

A relatively flat set of base classes is going to be far easier to manage than a large set of mixins and base classes. By using a flat stack, I can avoid writing crazy hacks to subvert what I have inherited.

Write the API you want

For instance, one of the things I really dislike about Django's CBVs is the extremely verbose way of adding new data to the context, which is something that ought to be really easy, but instead requires 4 lines:

class MyView(ParentView):
    def get_context_data(self, **kwargs):
        context = super(MyView, self).get_context_data(**kwargs)
        context['title'] = "My title"  # This is the only line I want to write!
        return context

In fact, it is often worse, because the data to add to the context may actually have been calculated in a different method, and stuck on self so that get_context_data could find it. And you also have the problem that it is easy to do it wrong e.g. if you forget the call to super things start breaking in non-obvious ways.

(In searching GitHub for examples, I actually found hundreds and hundreds of examples that look like this:

class HomeView(TemplateView):
    # ...

    def get_context_data(self):
        context = super(HomeView, self).get_context_data()
        return context

This doesn't make much sense, until I realised that people are using boilerplate generators/snippets to create new CBVs — such as this for emacs and this for vim, and this for Sublime Text. You know when you have created an unwieldy API when people need these kinds of shortcuts.)

So, the answer is:

Imagine the API you want, then implement it.

This is what I would like to write for static additions to the context:

class MyView(ParentView):
    context = {'title': "My title"}

and for dynamic:

class MyView(ParentView):
    def context(self):
        return {'things': Thing.objects.all()
                          if self.request.user.is_authenticated()
                          else Thing.objects.public()}

    # Or perhaps using a lambda:
    context = lambda self: ...

And I would like any context defined by ParentView to be automatically accumulated, even though I didn't explicitly call super. (After all, you almost always want to add to context data, and if necessary a subclass could remove specific inherited data by setting a key to None).

I'd also like for any method in my CBV to simply be able to add data to the context directly, perhaps by setting/updating an instance variable:

class MyView(ParentView):

    def do_the_thing(self):
        if some_condition():
            self.context['foo'] = 'bar'

Of course, it goes without saying that this shouldn't clobber anything at the class level and violate request isolation, and all of these methods should work together nicely in the way you would expect. And it should be impossible to accidentally update any class-defined context dictionary from within a method.

Now, sometimes after you've finished dreaming, you find your imagined API is too tricky to implement due to a language issue, and has to be modified. In this case, the behaviour is easily achievable, although it is a little bit magic, because normally defining a method in a subclass without using super means that the super class definition would be ignored, and for class attributes you can't use super at all.

So, my own preference is to make this more obvious by using the name magic_context for the first two (the class attribute and the method). That way I get the benefits of the magic, while not tripping up any maintainer — if something is called magic_foo, most people are going to want to know why it is magic and how it works.

The implementation uses a few tricks, the heart of which is using reversed(self.__class__.mro()) to get all the super-classes and their magic_context attributes, iteratively updating a dictionary with them.

Notice too how the TemplateView.handle method is extremely simple, and just calls out to another method to do all the work:

class TemplateView(View):
    # ...
    def handle(self, request):
        return self.render({})

This means that a subclass that defines handle to do the actual logic doesn't need to call super, but just calls the same method directly:

class MyView(TemplateView):
    template_name = "mytemplate.html"

    def handle(self, request):
        # logic here...
        return self.render({'some_more': 'context_data'})

In addition to these things, I have various hooks that I use to handle things like AJAX validation for form views, and RSS/Atom feeds for list views etc. Because I'm in control of the base classes, these things are simple to do.

Conclusion

I guess the core idea here is that you shouldn't be constrained by what Django has supplied. There is actually nothing about CBVs that is deeply integrated into Django, so your own implementation is just as valid as Django's, but you can make it work for you. I would encourage you to write the actual code you want to write, then make the base class that enables it to work.

The disadvantage, of course, is that maintenance programmers who have memorised the API of Django's CBVs won't benefit from that in the context of a project which uses another set of base classes. However, I think the advantages more than compensate for this.

Feel free to borrow any of the code or ideas if they are useful!

March 16, 2015 10:35 AM

A. Jesse Jiryu Davis

PyMongo's "use_greenlets" Followup

Fern - (cc) Wingchi Poon

In December, I wrote that we are removing the idiosyncratic use_greenlets option from PyMongo when we release PyMongo 3.

In PyMongo 2 you have two options for using Gevent. First, you can do:

from gevent import monkey; monkey.patch_all()
from pymongo import MongoClient

client = MongoClient()

Or:

from gevent import monkey; monkey.patch_socket()
from pymongo import MongoClient

client = MongoClient(use_greenlets=True)

In the latter case, I wrote, "you could use PyMongo after calling Gevent's patch_socket without having to call patch_thread. But who would do that? What conceivable use case had I enabled?" So I removed use_greenlets in PyMongo 3; the first example code continues to work but the second will not.

In the comments, PyMongo user Peter Hansen replied,

I hope you're not saying that the only way this will work is if one uses monkey.patch_all, because, although this is a very common way to use Gevent, it's absolutely not the only way. (If it were, it would just be done automatically!) We have a large Gevent application here which cannot do that, because threads must be allowed to continue working as regular threads, but we monkey patch only what we need which happens to be everything else (with monkey.patch_all(thread=False)).

So Peter, Bernie, and I met online and he told us about his very interesting application. It needs to interface with some C code that talks an obscure network protocol; to get the best of both worlds his Python code uses asynchronous Gevent in the main thread, and it avoids blocking the event loop by launching Python threads to talk with the C extension. Peter had, in fact, perfectly understood PyMongo 2's design and was using it as intended. It was I who hadn't understood the feature's use case before I diked it out.

So what now? I would be sad to lose the great simplifications I achieved in PyMongo by removing its Gevent-specific code. Besides, occasional complaints from Eventlet and other communities motivated us to support all frameworks equally.

Luckily, Gevent 1.0 provides a workaround for the loss of use_greenlets in PyMongo. Beginning the same as the first example above:

from gevent import monkey; monkey.patch_all()
from pymongo import MongoClient

client = MongoClient()


def my_function():
    # Call some C code that drops the GIL and does
    # blocking I/O from C directly.
    pass

start_new_thread = monkey.saved['thread']['start_new_thread']
real_thread = start_new_thread(my_function, ())

I checked with Gevent's author Denis Bilenko whether monkey.saved was a stable API and he confirmed it is. If you use Gevent and PyMongo as Peter does, port your code to this technique when you upgrade to PyMongo 3.

Image: Wingchi Poon, CC BY-SA 3.0

March 16, 2015 02:29 AM

Matthew Rocklin

Efficiently Store Pandas DataFrames

This work is supported by Continuum Analytics and the XDATA Program as part of the Blaze Project

tl;dr We benchmark several options to store Pandas DataFrames to disk. Good options exist for numeric data but text is a pain. Categorical dtypes are a good option.

Introduction

For dask.frame I need to read and write Pandas DataFrames to disk. Both disk bandwidth and serialization speed limit storage performance.

Disk bandwidth, between 100MB/s and 800MB/s for a notebook hard drive, is limited purely by hardware. Not much we can do here except buy better drives.
Serialization cost though varies widely by library and context. We can be smart here. Serialization is the conversion of a Python variable (e.g. DataFrame) to a stream of bytes that can be written raw to disk.

Typically we use libraries like pickle to serialize Python objects. For dask.frame we really care about doing this quickly so we’re going to also look at a few alternatives.

Contenders

pickle - The standard library pure Python solution
cPickle - The standard library C solution
pickle.dumps(data, protocol=2) - pickle and cPickle support multiple protocols. Protocol 2 is good for numeric data.
json - using the standardlib json library, we encode the values and index as lists of ints/strings
json-no-index - Same as above except that we don’t encode the index of the DataFrame, e.g. 0, 1, ... We’ll find that JSON does surprisingly well on pure text data.
msgpack - A binary JSON alternative
CSV - The venerable pandas.read_csv and DataFrame.to_csv
hdfstore - Pandas’ custom HDF5 storage format

Additionally we mention but don’t include the following:

dill and cloudpickle- formats commonly used for function serialization. These perform about the same as cPickle
hickle - A pickle interface over HDF5. This does well on NumPy data but doesn’t support Pandas DataFrames well.

Experiment

Disclaimer: We’re about to issue performance numbers on a toy dataset. You should not trust that what follows generalizes to your data. You should look at your own data and run benchmarks yourself. My benchmarks lie.

We create a DataFrame with two columns, one with numeric data, and one with text. The text column has repeated values (1000 unique values, each repeated 1000 times) while the numeric column is all unique. This is fairly typical of data that I see in the wild.

df = pd.DataFrame({'text': [str(i % 1000) for i in range(1000000)],
                   'numbers': range(1000000)})

Now we time the various dumps and loads methods of the different serialization libraries and plot the results below.

Time costs to serialize numeric data

As a point of reference writing the serialized result to disk and reading it back again should take somewhere between 0.05s and 0.5s on standard hard drives. We want to keep serialization costs below this threshold.

Thank you to Michael Waskom for making those charts (see twitter conversation and his alternative charts)

Gist to recreate plots here: https://gist.github.com/mrocklin/4f6d06a2ccc03731dd5f

Further Disclaimer: These numbers average from multiple repeated calls to loads/dumps. Actual performance in the wild is likely worse.

Observations

We have good options for numeric data but not for text. This is unfortunate; serializing ASCII text should be cheap. We lose here because we store text in a Series with the NumPy dtype ‘O’ for generic Python objects. We don’t have a dedicated variable length string dtype. This is tragic.

For numeric data the successful systems systems record a small amount of metadata and then dump the raw bytes. The main takeaway from this is that you should use the protocol=2 keyword argument to pickle. This option isn’t well known but strongly impacts preformance.

Note: Aaron Meurer notes in the comments that for Python 3 users protocol=3 is already default. Python 3 users can trust the default protocol= setting to be efficient and should not specify protocol=2.

Time costs to serialize numeric data

Some thoughts on text

Text should be easy to serialize. It’s already text!
JSON-no-index serializes the text values of the dataframe (not the integer index) as a list of strings. This assumes that the data are strings which is why it’s able to outperform the others, even though it’s not an optimized format. This is what we would gain if we had a string dtype rather than relying on the NumPy Object dtype, 'O'.
MsgPack is surpsingly fast compared to cPickle
MsgPack is oddly unbalanced, it can dump text data very quickly but takes a while to load it back in. Can we improve msgpack load speeds?
CSV text loads are fast. Hooray for pandas.read_csv.

Some thoughts on numeric data

Both pickle(..., protocol=2) and msgpack dump raw bytes. These are well below disk I/O speeds. Hooray!
There isn’t much reason to compare performance below this level.

Categoricals to the Rescue

Pandas recently added support for categorical data. We use categorical data when our values take on a fixed number of possible options with potentially many repeats (like stock ticker symbols.) We enumerate these possible options (AAPL: 1, GOOG: 2, MSFT: 3, ...) and use those numbers in place of the text. This works well when there are many more observations/rows than there are unique values. Recall that in our case we have one million rows but only one thousand unique values. This is typical for many kinds of data.

This is great! We’ve shrunk the amount of text data by a factor of a thousand, replacing it with cheap-to-serialize numeric data.

>>> df['text'] = df['text'].astype('category')
>>> df.text
0      0
1      1
2      2
3      3
...
999997    997
999998    998
999999    999
Name: text, Length: 1000000, dtype: category
Categories (1000, object): [0 < 1 < 10 < 100 ... 996 < 997 < 998 < 999]

Lets consider the costs of doing this conversion and of serializing it afterwards relative to the costs of just serializing it.

	seconds
Serialize Original Text	1.042523
Convert to Categories	0.072093
Serialize Categorical Data	0.028223

When our data is amenable to categories then it’s cheaper to convert-then-serialize than it is to serialize the raw text. Repeated serializations are just pure-win. Categorical data is good for other reasons too; computations on object dtype in Pandas generally happen at Python speeds. If you care about performance then categoricals are definitely something to roll in to your workflow.

Final Thoughts

Several excellent serialization options exist, each with different strengths.
A combination of good serialization support for numeric data and Pandas categorical dtypes enable efficient serialization and storage of DataFrames.
Object dtype is bad for PyData. String dtypes would be nice. I’d like to shout out to DyND a possible NumPy replacement that would resolve this.
MsgPack provides surprisingly good performance over custom Python solutions, why is that?
I suspect that we could improve performance by special casing Object dtypes and assuming that they contain only text.

March 16, 2015 12:00 AM

March 15, 2015

Amit Saha

Doing Math with Python: Stay Updated

I am reaching the final stages of my new book. Here are few ways to stay updated about the book:

Blog posts: http://echorand.me/category/doingmathwithpython/

Facebook page: https://www.facebook.com/doingmathwithpython

G+ Community: https://plus.google.com/u/0/communities/113121562865298236232

Twitter: https://twitter.com/mathwithpython

If you are an educator/teacher, I can also try to get a sample for you to look at the current pre-released version of the book.

March 15, 2015 09:43 AM

March 14, 2015

Leonardo Giordani

Python 3 OOP Notebooks

The Python 3 OOP series of posts that you can find here is now available as a series of IPython Notebooks.

From the official site:

The IPython Notebook is a web-based interactive computational environment where you can combine code execution, text, mathematics, plots and rich media into a single document.

As a matter of fact, IPython Notebook is the perfect environment to teach Python itself. If you want to know more about this wonderful piece of software check the official site

You can find the notebook of each post here

or download the whole series as a zip file

As everything on this blog, notebooks are released under Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0). Feel free to submit corrections to the GitHub issues page.

This is a preview of the notebooks in action

Notebook

March 14, 2015 09:00 PM

Ned Batchelder

Finding temp file creators

One of the things that is very useful about Python is its extreme introspectability and malleability. Taken too far, it can make your code an unmaintainable mess, but it can be very handy when trying to debug large and complex projects.

Open edX is one such project. Its main repository has about 200,000 lines of Python spread across 1500 files. The test suite has 8000 tests.

I noticed that running the test suite left a number of temporary directories behind in /tmp. They all had names like tmp_dwqP1Y, made by the tempfile module in the standard library. Our tests have many calls to mkdtemp, which requires the caller to delete the directory when done. Clearly, some of these cleanups were not happening.

To find the misbehaved code, I could grep through the code for calls to mkdtemp, and then reason through which of those calls eventually deleted the file, and which did not. That sounded tedious, so instead I took the fun route: an aggressive monkeypatch to find the litterbugs for me.

My first thought was to monkeypatch mkdtemp itself. But most uses of the function in our code look like this:

from tempfile import mkdtemp ... d = mkdtemp()

Because the function was imported directly, if my monkeypatching code ran after this import, the call wouldn't be patched. (BTW, this is one more small reason to prefer importing modules, and using module.function in the code.)

Looking at the implementation of mkdtemp, it makes use of a helper function in the tempfile module, _get_candidate_names. This helper is a generator that produces those typical random tempfile names. If I monkeypatched that internal function, then all callers would use my code regardless of how they had imported the public function. Monkeypatching the internal helper had the extra advantage that using any of the public functions in tempfile would call that helper, and get my changes.

To find the problem code, I would put information about the caller into the name of the temporary file. Then each temp file left behind would be a pointer of sorts to the code that created it. So I wrote my own _get_candidate_names like this:

import inspect import os.path import tempfile real_get_candidate_names = tempfile._get_candidate_names def get_candidate_names_hacked(): stack = "-".join( "{}{}".format( os.path.basename(t[1]).replace(".py", ""), t[2], ) for t in inspect.stack()[4:1:-1] ) for name in real_get_candidate_names(): yield "_" + stack + "_" + name tempfile._get_candidate_names = get_candidate_names_hack

This code uses inspect.stack to get the call stack. We slice it oddly, to get the closest three calling frames in the right order. Then we extract the filenames from the frames, strip off the ".py", and concatenate them together along with the line number. This gives us a string that indicates the caller.

The real _get_candidate_names function is used to get a generator of good random names, and we add our stack inspection onto the name, and yield it.

Then we can monkeypatch our function into tempfile. Now as long as this module gets imported before any temporary files are created, the files will have names like this:

tmp_case53-case78-test_import_export289_DVPmzy/ tmp_test_video36-test_video143-tempfile455_2upTdS.srt

The first shows that the file was created in test_import_export.py at line 289, called from case.py line 78, from case.py line 53. The second shows that test_video.py has a few functions calling eventually into tempfile.py.

I would be very reluctant to monkeypatch private functions inside other modules for production code. But as a quick debugging trick, it works great.

March 14, 2015 06:38 PM

Will McGugan

Saving processes and threads in a WSGI server with Moya

I have a webserver with 3 WSGI applications running on different domains (1, 2, 3). All deployed with a combination of Gunicorn and NGINX. A combination that works really well, but there are two annoyances that are only going to get worse the more sites I deploy:

A) The configuration for each server resides in a different location on the filesystem, so I have to recall & type a long path to edit settings.

B) More significantly, each server adds extra resource requirements. I follow the advice of running each WSGI application with (2 * number_of_cores + 1) processes, each with 8 threads. The threads may be overkill, but that ensures that the server can use all available capacity to handle dynamic requests. On my 4 core server, that's 9 processes, 72 threads per site. Or 27 processes, and 216 threads for the 3 sites. Clearly that's not scalable if I want to host more web applications on one server.

A new feature recently added to Moya fixes both those problems. Rather than deploy a WSGI application for each site, Moya can now optionally create a single WSGI application that serves many sites. With this new system, configuration is read from /etc/moya/, which contains a directory structure like this:

|-- logging.ini
|-- moya.conf
|-- sites-available
|   |-- moyapi.ini
|   |-- moyaproject.ini
|   `-- notes.ini
`-- sites-enabled
    |-- moyapi.ini
    |-- moyaproject.ini
    `-- notes.ini

At the top level is “moya.conf” which contains a few server-wide settings, and “logging.ini” which contains logging settings. The directories “sites-available” and “sites-enabled” work like Apache and NGINX servers; settings for each site are read from “sites-enabled”, which contains symlinks to files in “sites-available”.

Gunicorn (or other wsgi server) can run these sites with a single instance by specifying the WSGI module as “moya.service:application”. This application object dispatches the request to the appropriate server (based on a domain defined in the INI).

Because all three sites are going through a single Gunicorn instance, only one lot of processes / threads will ever be needed. And the settings files are much easier to locate. Another advantage is that there will be less configuration required to add another site.

This new multi-server system is somewhat experimental, and hasn't been documented. But since I believe in eating my own dog-food, it has been live now for a whole hour–with no problems.

March 14, 2015 05:08 PM

Python Software Foundation

Membership Vote

This morning, PSF Director David Mertz announced on the PSF Members' mailing list the opening of a vote. For those of you who have already self-certified as voting members, or if you are already a Fellow of the Foundation, you should have received the announcement in a private email.

This is our first stab at using the voting mechanism to get a sense of the larger membership's views on an issue currently under discussion (the non-binding poll), so we urge you to take a moment and make your voice heard.

To review your eligibility to vote and to see the certification form, please see my previous blog post Enroll as Voting Member or go to the PSF Website.

Here is the announcement:

Membership Vote for Pending Sponsors and Non-Binding Poll

The candidate Sponsor Members listed below were recommended for approval by the Python Software Foundation Board of Directors. Following the ballot choices is a detailed description of the organization (the submit button is after the descriptions, so scroll down for it).
This election will close on 2015-03-26.
Sponsor Member Candidates

Bloomberg LP yes no abstain

Fastly yes no abstain

Infinite Code yes no abstain

Non-Binding Poll on PyCon Video Sublicensing

Purpose: The PSF Board of Directors is seeking the collective perspective of PSF Voting Members on the appropriate handling of video recording sublicensing for presentations at PyCon US. These videos are currently made freely available on Google's YouTube, and may be incorporated into other sites through YouTube's embedding features. There are no plans to change that arrangement, but a separate question has arisen that requires determining whether it would be appropriate to exercise the sublicensing rights granted to the PSF under the PyCon US speaker agreement. This part of the poll serves as a non-binding survey of PSF Voting Members, intended to help the Directors formulate a suitable policy in this area based on the way the PyCon US speaker agreement is generally perceived, rather than based solely on what it permits as a matter of law.
Background: A request has been made to the PSF to sublicense video recordings made at PyCon of speaker presentations. The license agreement signed by speakers gives the PSF the right to grant such sublicenses, however the Board of Directors is of mixed opinion about whether we should do so. The release form (i.e. license) agreed to by speakers is at https://us.pycon.org/2015/speaking/recording/ for reference. Note that YouTube is explicitly mentioned in the release as an example of such a sublicensee, and pyvideo.org has always been given this right (although they have only exercised it thus far by embedding YouTube hosted videos, not by mirroring content, and hence are not technically a sublicensee at this point). Embedding a video does not require a sublicense, only mirroring it does.
There are two axes along which the Board is divided. On the one hand, we are not unanimous about whether we should grant a sublicense to commercial entities which may benefit financially by providing local copies of these video recordings, and may even potentially grant such local access only to subscribers in some manner. In favor of granting such access, some Directors feel that the more widespread the mirroring, the better, regardless of the commercial or non-commercial nature of the hosting (i.e. as long as the gratis access is never removed, which is not being contemplated). In opposition to granting such access, some Directors feel that for-profit sublicensees will gain unfair commercial advantage by bundling PyCon videos with other content sold for profit. Potentially the PSF may require payment, and gain revenue, for granting these sublicense rights.
On the other hand, we are also not unanimous about whether—if we do grant sublicenses—we should do so only prospectively, once we can inform speakers of our intent prior to their talks, or whether we should exercise the rights given in speaker releases even retroactively for previous PyCons. While speakers have given such rights already in a legal sense, some Directors feel they may not have fully contemplated that grant at the time, and only going forward, with more explicit information about sublicensing intents of the PSF, should sublicensing be allowed to other entities.

Sublicense entities Only YouTube (others embedding) As many mirrors as possible Only non-commercial mirrors

Sublicense timeframe Prospectively only Including retroactively Not applicable

Bloomberg LP
As the market data and analysis industry leader, Bloomberg LP provides a broad portfolio of innovations to our clients. Bloomberg's Open Market Data Initiative is part of our ongoing efforts to foster open solutions for the the financial services industry. This includes a set of published Python modules that are freely available to our clients at http://www.bloomberglabs.com/api/libraries/. In support of promoting further Python usage within the financial services industry, we have hosted a number of free public developer-focused events to support the Python ecosystem—including the Scientific Python community. Please refer to http://go.bloomberg.com/promo/invite/bloomberg-open-source-day-scientific-python/ and https://twitter.com/Mbussonn/status/533566917727223808. By becoming a member, we wish to further increase our support of the PSF in its mission to promote, protect, and advance the Python programming language.

Fastly
Fastly provides the PSF with unlimited free CDN services, a dedicated IP block, and hosted certificates. We also provide the PSF with free Premium Support. Over the last few months, Fastly’s comped services to the PSF totalled up to ~$20,000/month. In January 2015 alone, the PSF sent 1.7 billion requests and 132 TB through Fastly.
Python is a the go-to language at Fastly for building developer tools. Python allows Fastly to rapidly prototype and deploy novel protocols and services over multiple platforms, including devices like network switches, which are traditionally not programmable. Fastly relies on Python for data analysis and to dynamically reconfigure network switching and routing to steer every request to the closest available server. These tools are instrumental in helping Fastly reliably deliver more traffic in less time.

Infinite Code
Infinite Code is a software development firm with offices in Beijing, China and Kuala Lumpur, Malaysia. We are strong believers in Free/Open Source Software and the people centric principles of Agile Development. Our language of choice is Python for software development where possible. Our recent Python developments run the range from high volume, real money gaming platforms to massively parallel data gathering and transformation for large quantities of data. Our developers have been using Python since 2001.

I would love to hear from readers. Please send feedback, comments, or blog ideas to me at msushi@gnosis.cx.

March 14, 2015 05:04 PM

Python Namibia

The second PSF sponsored African conference I want to tell you about is Python Namibia (only a mere 3500 kilometers or 2175 miles south of Cameroon). The conference, the first ever held in Namibia, was held Feb 2 – 5, 2015 at the University of Namibia in the city of Windhoek. The PSF provided funds at the level of "Gold Sponsorship" that were used to subsidize travel for international attendees and to purchase a banner.

Photo credit to python-namibia.org

According to an email to the PSF from organizer Daniele Procida, “. . . the event was a success, with 65 attendees for the four days, and was met with huge enthusiasm by our Namibian hosts. I hope to be back in Namibia next year for an even bigger event, organised by the newly-established Python community there.”

The official website Python Namibia provides additional information and thanks to the conference's additional sponsors: Cardiff University in Wales (through its Phoenix Project), The University of Namibia, and the Django/Python web agency, Divio AG in Zürich.

One of the attendees was the PSF's good friend, the geologist Carl Trachte, who sums up his reasons for attending PyCons all around the world as:

The neat thing about country/regional conferences is that you more frequently get to talk to developers or tech professionals from that place who don’t always frequent conferences outside their area. Seeing how Python (and digital technology in general) is being used in Sub-Saharan Africa (for the establishment of a wireless network, for example), learning what the average work day is like for a Pythonista in these parts of the world - those are things you really can’t get without being there.

The four days of talks, workshops, coding, collaboration and interaction engendered such enthusiasm and interest that on the last day a group of the participants self-organized to form “PyNam, the Python Namibia Association”.

Photo Credit to python-namibia.org

We certainly look forward to more exciting projects and events coming out of this group.

March 14, 2015 05:03 PM

Zato Blog

Hot-deploying API services without server restarts

One of the nice things added in Zato 2.0 is the improved ability to store code of one's API services directly in a server's hot-deploy directory - each time a file is saved it is uploaded on server and automatically propagated throughout all the other nodes in a cluster the given server belongs to.

Now, this in itself had been already doable since the very 1.0 version but the newest release added means to configure servers not to clean up hot-deploy directory after the code was picked up - meaning anything that is saved in there stays until it's deleted manually.

Two cool things can be achieved thanks to it:

Working in deploy-on-save mode
Deploying code from a repository checkout

Initial steps

To make it all possible, navigate to all of the servers' server.conf files, find hot_deploy.delete_after_pick_up, change it from True to False and restart all servers. This is the only time they will be restarted, promise.

Working in deploy-on-save mode

Let's say your server is in /home/user/zato/server1
Save your files in /home/user/zato/server1/pickup-dir now
Each time it's saved, note in server.log how it's picked up and deployed
This lets you make use of the service in the actual environment a moment after it's saved

Deploying code from a repository checkout

Essentially, this is deploy-on-save described above working on a grander scale
Instead of saving individual files, everything that is needed for a given solution is stored in the hot-deploy's pickup directory in one go
Can be easily plugged into Jenkins or other automation tools
You can try it right now using this sample repository prepared for the article
Go to a server's pickup dir
Delete anything it already contains
Issue the command below:

$ git clone https://github.com/zatosource/hot-deploy-sample.git .

Witness that the two services just checked out are being nicely picked up by all servers in a cluster
This concludes the deployment - an environment has been just updated with newest versions of services and they are already operational, as can be confirmed in web-admin

March 14, 2015 04:04 PM

March 13, 2015

Salim Fadhley

Advertise a new Python job by making a github push

My colleagues Steve Stagg has created a wonderful new thing – an entirely free Python jobs board. Anybody can add a new job simply by making a GitHub pull request. It’s a work of genius because it’s the absolute simplest possible solution to a problem that’s been bothering me for weeks – and the beauty of […]

March 13, 2015 10:16 PM

PyCharm

Feature Spotlight: VCS integration in PyCharm

Happy Friday everyone,

Today we’ll take a look at some of the basic VCS features in PyCharm that can help manage different version control systems.

You may already know that PyCharm has seamless integration with major version controls like Git, GitHub, Subversion, Mercurial, Perforce (available only in PyCharm Professional Edition), and CVS. Even though all these version controls have different models and command sets, PyCharm makes life a lot easier by advocating a VCS-agnostic approach for managing them wherever possible.

So here we go:

Checking out a project from a VCS

To import a project from a version control system, click the Check out from Version Control button on the Welcome screen, or use the same VCS command from the main menu:

Version Control settings

A project’s version control settings are accessed via Settings → Version Control. You can associate any of the project folders with a repository root. These associations can be removed at any time, or you can even opt to disable the version control integration entirely:

PyCharm can handle multiple VCS repositories assigned to different folders of the project hierarchy, and perform all VCS operations on them in uniform manner.

Changes tool window and changelists

After version control is enabled for a project, you can see and manage your local changes via the Changes tool window. To quickly access the tool window, press Alt + 9 (Cmd-9 on a Mac):

All changes are organized into changelists that can be created, removed, and made active.

Quick list of VCS operations

When you need to perform a VCS operation on a currently selected file, directory, or even on the entire project, bring up the VCS operations quick-list via Alt+Back Quote (Ctrl-V on a Mac):

Show History

The history of changes is available for a set of files or directories via the VCS operations quick-list, or in the main menu VCS →<version control name> → Show History, or in the context menu → Show History:

To see all changes for a specific code snippet, use the Show History for Selection action.

Annotations

Annotations are available from the quick-list, the main menu or the context menu. They allow you to see who changed a certain line of code and when:

When you click the annotation, you will see the detailed information about the corresponding commit.

Useful shortcuts

Commit current changelist Ctrl+K (Cmd-K on a Mac)
Update the project Ctrl+T (Cmd-T on a Mac)
Mark selected files and folders as added Ctrl+Alt+A (Alt-Cmd-A on a Mac)
Mark selected files and folders as changed (checked out) via Ctrl+Alt+E (Alt-Cmd-E on a Mac)
Show diff (available in the Changes tool window) via Ctrl+D (Cmd-D on a Mac)
Move changes to another change list (available in the Changes tool window) via F6
Push commits to remote repositories via Ctrl+Shift+K (Cmd-Shift-K on a Mac)

Commit options

When committing changes, PyCharm lets you perform a variety of operations:

change the file set to commit to,
join the changes with the previous commit by using the Amend commit option,
reformat the changed code,
optimize imports,
ensure that there are no inspection warnings,
update the copyright information,
or even upload the changes to a remote FTP server.

Ignored files

To configure the ignored files, go to Settings → Version Control, or use the corresponding button in the Changes tool window:

The actual list of ignored files can be displayed in the Changes tool window next to the changelists by clicking the corresponding button.

Branches

With PyCharm you can easily create, switch, merge, compare and delete branches (available for Git and Mercurial only). To see a list of existing branches or create a new one, use either the Branches from the main or context menu, or the VCS operations quick-list, or the widget on the right-hand side of the status bar:

For multiple repositories, PyCharm performs all VCS operations on all branches simultaneously, so you don’t need to switch between them manually.

Shelves, stashes, and patches

Shelves and Stashes help you when you need to put away some local changes without committing them to repository, then switch to the repository version of the files, and then come back to your changes later. The difference between them is that Shelves are handled by PyCharm itself and are stored in the local file system, while Stashes are kept in a VCS repository.

Patches allow you to save a set of changes to a file that can be transferred via email or file sharing and then applied to the code. They are helpful when you’re working remotely without having a constant connection to your VCS repository and still need to contribute:

Log

To see the entire list of commits in a repository, sorted and filtered by branch, user, date, folder, or even a phrase in description, use the Log tab in the Changes tool window. This is the easiest way to find a particular commit, or to just browse through the history:

In this blog post we touched just a tip of the VCS integration iceberg. Go ahead and try this functionality in action! Here’s a tutorial that can walk you through the VCS integration features and provide additional information. And if after that you’re still craving yet more details, please see our online help.

That’s it for today. See you next week!
-Dmitry

March 13, 2015 08:37 PM

Python Software Foundation

Unicef Pi4Learning

I previously posted about a wonderful education program utilizing Raspberry Pis (AstroPi). Here’s another one:

Since last May, Unicef has been using Raspberry Pis to educate Syrian children who have been displaced into Lebanon due to their country’s civil war. The program, called Pi4Learning was developed by James Cranwell-Ward, UNICEF Lebanon Innovation Lead, and Eliane Metni of the International Education Association.

With approximately 300,000 Syrian school children living as refugees in Lebanon with no educational resources, Unicef’s Cranwell-Ward sought an inexpensive, ready-to-go solution that could be implemented in refugee camp environments. Already a Raspberry Pi enthusiast, he paired the device with Alex Eames' KickStarer funded HDMIPi screens. Working with Eliane Metni, who had been piloting Raspberry Pis at Dhour El Shweur Public Secondary School in Lebanon, they obtained free Arabic language curriculum from Khan Academy and began providing free classes to the Syrian children.

The Pi4L program is divided into learning tracks: Core Skills Modules for ages 6 – 12 (literacy, numeracy, and science, using Khan Academy content); Technology Applications for ages 5 – 18 (Learning to Code and Coding to Learn); and Continuing Education and Certification for Teachers.

Each complete computer system costs around $100 and the Khan Academy content is stored and can be delivered offline. Currently approximately 30,000 refugees are using the program, and the goal is to continue to expand.

Both Cranwell-Ward and Metni are especially excited that the program teaches kids to code and to become creative participants in an increasingly technological world community. According to Cranwell-Ward, “The rate at which tech is being rolled out into our lives is phenomenal and coding - or the understanding of technology and how to manipulate it - is going to be a core component of our lives and our children’s lives moving forward… . “There needs to be some basic understanding of what technology is, how it can be manipulated, how we can use it to help ourselves, and not just be a consumer or slave,” quoted from the The Guardian.

One of the students is 11-year-old Zeinab Al Jusuf. There is a video about her experiences and the Unicef project at Unicef stories.

There is also a wealth of information online about this project, so if you’re at all interested I urge you to read more. For an excellent overview by Unicef’s Luciano Calestini, see Innovation.

I would love to hear from readers. Please send feedback, comments, or blog ideas to me at msushi@gnosis.cx.

March 13, 2015 04:02 PM

Bloomberg LP	yes	no	abstain
Fastly	yes	no	abstain
Infinite Code	yes	no	abstain

Sublicense entities	Only YouTube (others embedding)	As many mirrors as possible	Only non-commercial mirrors
Sublicense timeframe	Prospectively only	Including retroactively	Not applicable