MongoDB at the EuroPython and a new sync/async ODM: μMongo

2016-08-01 - Emmanuel Leblond python mongo

Coming back from the EuroPython where I could share feelings about the MongoDB ODMs in Python. I've learned few things there:

People only talk about Mongoengine

Mongoengine is the most well known ODM out there, probably because it is one of the oldest and most starred on Github. Beside it's the only one on the 1st Google page of a python mongodb search. So the alternatives to Mongoengine seems forgotten even if they are pretty numerous: humbledb, mongoalchemy, minimongo, ming, nanomongo etc.

My guess is people who come to MongoDB go with the bigger thing, if it's good they stick with it, if it's bad they don't want to do another poor choice and start relying on themselves :-(

People don't like Mongoengine

Asking what people think about Mongoengine was my little funny moment: First they always complained about how bad it is, then I announced I'm part of the Mongoengine team and finally I could see people turning white, red or laughing depending on how deeply they shared there painful relationship with Mongoengine.

But honestly I can't blame them. I think Mongoengine trouble comes from it inception: It was created a long time ago when MongoDB was just out.

My guess is back in those dark times they was not enough experience with this tech and the original author wanted to create and ORM for mongo to mimic what you would do with an SQL database.

For example Mongoengine allows you to add a reverse_delete_rule to a reference field. This way, if the referenced document is destroyed Mongoengine can update the reference to null or even do cascade deletion, feels pretty SQL isn't it ? A bit too much !

The trouble is this operation is not atomic at all given it is done by Mongoengine on top of a regular pymongo driver (and MongoDB cannot do that anyway). So you end up with concurrency issues, inconsistent data if your application stops during the processing of this operation etc.

I intended a talk on the MongoDB Driver during the week, at the end someone asked the talker what does he think about Mongoengine:

In a nutshell: people think of Mongoengine like a way to do SQL stuff in Mongo !

People think they don't need an ODM (but they do !)

After putting people into embarrassment with my Mongoengine question, I try to relieve them by asking them about the ODM they use. The answer was "Nothing, just plain pymongo !" pretty often (the "don't want to two poor choices in a row" effect I guess).

However, after digging a bit, I realize those people created they own validation system around pymongo, and so ended up with a custom ODM when their project grown up and they tried to refactor it...

If you go further in the upper video, that's precisely what says the next talking person: plenty of people use Mongoengine because they want a way to validate they documents.

AsyncIO is "the next big thing"™

The number of talks about asyncIO was crazy high, the nodejs community has proved Twisted was right a decade ago (well it's not the silver bullet, but I think it goes along pretty well with MongoDB big data credo). Now If you want to do asynchronous mongoDB, there is drivers. But you run short on ODM: I could only find motorengine, which is a fork of... Mongoengine !

What's from there ?

  • MongoDB is not SQL, you don't want to feel like SQL because it would mean a big lie behind the scene
  • an ODM should gives you object orientation and data validation, nothing more
  • Even schema-less, you need to validate what's going into your DB (think of MongoDB as Python: "this variable can be anything but it wont be because otherwise my program will blow up !")
  • An async ODM is clearly missing here ;-)

So here is μMongo, a simple ("μ" used to stand for "micro" but given it is now pretty feature-full it is now only the letter "mu") MongoDB ODM compatible with the majors drivers (both sync and async !) and relying on Marshamllow for all the validation/serialization stuff (why reinvent the wheel when a first class library is available ?).

Let's go to the Euro Pycon !

2016-07-17 - Emmanuel Leblond misc

Tomorrow starts the Euro Pycon 2016 in Bilbao ;-)

Well it actually started today, but the events were only for women ("Django Girl") and newcomers to python ("Beginner's day"). The latter catch my attention btw: I would guess only long time lover of a language would come to the conference about it !

But on second though, if I look back since how much time I've been struggling to learn Haskell, such event could be exactly the impulse I need to overcome my daily routine get into the ecosystem & gravitating philosophy.

It haven't started yet, I'm already learning ! What a week this gonna be ;-)

EDIT: What a week indeed !

  • Tons of Goodies: T-shirts, stickers, towel, powerpack battery, and even a micropython board !
  • Conferences, riddles organized by the sponsors
  • I've join the Pypy team during the sprint weekend, the are as awesome as there project !
  • Good food from the Basque country, we got an event inside a cider house too
  • And of course plenty of cool people to hang out with ;-)

Can't wait for the next PyCon

Learning async programming the hard way

2016-06-13 - Emmanuel Leblond python

This weekend, I noticed diesel.io is no longer reachable. I guess it's time to organize the funerals of this project.

First a word about the dead from it pypi page:

diesel is a framework for easily writing reliable and scalable network applications in Python. It uses the greenlet library layered atop asynchronous socket I/O in Python to achieve benefits of both the threaded-style (linear, blocking-ish code flow) and evented-style (no locking, low overhead per connection) concurrency paradigms. It’s design is heavily inspired by the Erlang/OTP platform.

Announced in 2010, the project was well received (+500 stars on github) but eventually failed to make it hole in the crowded world of Python async frameworks.

In late 2013, Bump, the company that developed diesel for it own needs, was bought by Google, it products discontinued and team merged.

This put a neat stop to the project, which entered in hibernation: diesel's contributions

It's only a year later I got interested by the project.

Why? Maybe it's because I've always had a weakness for lost causes (to paraphrase Rhett Buttler).

What really caught my eye was diesel offers an implementation of Flask and it it ecosystem (let aside the extensions making use of the network which blocking nature couldn't play nice in an async framework).

The drawback, however, was diesel was Python 2 only and, as I stated earlier, it community was sparse if not vanished.

Given I was having some free time (that's the reason I start researching about async frameworks in the first place) I choose the hard way: porting the project to Python 3 !

Who knows ? Maybe seeing someone investing time into there project could re-motivate it creators, and with some posts&benchmarks on reddit we could even bring new people in...

Dowski, the maintainer of the project, was really friendly and helpful and shared my hopes on the project. So I started working on the port, and a month later the work was done.

However my attempt to shake the sleeping giant turned short: too few free time, missing motivation and Diesel4 (the new major version with Python 3 support) didn't get released on pypi, in fact it didn't even make it way to the master :'-(

Put this way, this seems harsh: I worked hard and no-one will ever use my work (not even me !). But in fact it's all the opposite !

One of the biggest grieve Armin Ronacher gave to Python3 is the str/bytes separation which make porting low level code pretty hard. Well diesel was full of such things (protocols, transport, socket communication etc.) so porting the code was far more than just a 2to3 pass, I actually had to read and understand all the code !

This project gave me the opportunity to go deep into async programming, to understand the reactor pattern and hack into an implementation of it, to discover greenlet, to play with the redis and mongodb protocols, and to discover someone finally did something to replace the awful logging module.

With asyncio raising as the new unified standard for async programming in Python, diesel is damn no matter what. So it's time to move on let it rest in peace, but I would suggest you anyway to go pay your respects to it, may it code teaches you as much as it did to me.


PS: After 2 years of inactivity, Dowski's blog got a new entry:

I think it's time to make things again.

Dowski

Is it really the end of diesel ?

The meta article - Warmup with lektor

2016-06-09 - Emmanuel Leblond python misc

This blog aims at writting about my experiences in programming, so what better start but to talk about my experience building this blog !

First and foremost, show me the code.

Hearing about Armin "flask" Ronacher's new project, I figured out lektor was exactly the tool I was looking for. Beside a young project means opportunities to contribute, adding your little rocks to the growing mountain is so much rewarding ;-)

Working with lektor feels like flask: you get the concept in a glimpse, have a working poc in minutes but thanks to it great architecture there is plugins for basically all your needs.

I have to say this is really impressing for such a young project !

Now for (literally) the big picture:

alt text

This is a pretty classic workflow, a few comments though:

lektor provides plenty ways to deploy, my choice was... none ! Armin presented lektor as a way for him to handle his parents' blog without suffering too much. Basically he (as the "tech guy") had to configure at first the project (how to deploy, the css etc.) then he can hand it to his parents who will be able to create&update posts and deploy inside a CMS gui (the dev lektor server running on their own computer). Furthermore, they could just throw the lektor project folder inside their Dropbox and to get their blog saved without hassles !

This is an awesome workflow for non-technical people, but as someone who use git everyday I want to be able to write an article in a branch, then merge it to the master and finally have my blog updated once the code is pushed on github...

...wich leads us to Travis. I love the way you can store secret inside the .travis.yml. Travis creates a gpg key for each project and uses it to crypt your data. Of course only trusted build have access to those data to prevent someone sending you a PR with echo $MY_SECRET_VAR to steal your stuff !

Considering the hosting, I first thought about github pages but I don't feel comfortable storing my Github credentials within Travis (even if I could create a token with limited access, the granularity would allow write access to all my public repos...).

So I head up to netlify, the service seems good, they put a little token (can't miss it though !) on the right corner but given the storage is free I would call this an exchange of interests ;-)

I'm less happy with the fact you must use the netlify command to deploy your stuff on their servers, sftp cannot beat the swag of a nodejs bloat with a shiny colored cli I guess...