William Jiang

JavaScript,PHP,Node,Perl,LAMP Web Developer – http://williamjxj.com; https://github.com/williamjxj?tab=repositories

Perl vs. Python vs. Ruby

Perl vs. Python vs. Ruby

This article is from web. I’m evaluating Python and Ruby as replacements for Perl.

I’ve been using Perl for several years and am very comfortable with it, although I’m definitely not an expert. Perl is a powerful language, but I think it’s ugly and encourages writing bad code, so I want to get rid of it. Python and Ruby both come with Mac OS X 10.2, both have BBEdit language modules, and both promise a cleaner approach to scripting. Over the past few weeks I read the Python Tutorial and the non-reference parts of Programming Ruby, however as of this afternoon I’d not written any Python or Ruby code yet.

Here’s a toy problem I wanted to solve. eSellerate gives me a tab-delimited file containing information about the people who bought my shareware.
I wanted a script to extract from this file the e-mail addresses of people who asked to be contacted when I release the new versions of the products.

I decided to solve this problem in each language and then compare
the resulting programs. The algorithm I chose was just the first one
that came to mind. I coded it first in Ruby, and then ported the code
to Python and Perl, changing it as little as possible. Thus, the style
is perhaps not canonical Python or Perl, although since I’m new to Ruby
it’s probably not canonical Ruby either. If I were just writing this in Perl, I might have tried to avoid Perl’s messy syntax for nested arrays and instead used an array of strings.

Here’s the basic algorithm:

  1. Read each line of standard input and break it into fields at each tab.
  2. Each field is wrapped in quotation marks, so remove them. Assume that there are no quotation marks in the interior of the field.
  3. Store the fields in an array called record.
  4. Create another array, records and fill it with all the records.
  5. Make a new array, contactRecords, that contains arrays of just the fields we care about: SKUTITLE, CONTACTME, EMAIL.
  6. Sort contactRecords by SKUTITLE.
  7. Remove the elements of contactRecords where CONTACTME is not 1.
  8. Print contactRecords to standard output, with the fields separated by tabs and the records separated by newlines.

And here’s the code:


#!/usr/bin/perl -w

use strict;

my @records = ();

foreach my $line ( <> )
    my @record = map {s/"//g; $_} split("\t", $line);
    push(@records, \@record);

my $EMAIL = 17;
my $CONTACTME = 27;
my $SKUTITLE = 34;

my @contactRecords = ();
foreach my $r ( @records )
    push(@contactRecords, [$$r[$SKUTITLE], 
          $$r[$CONTACTME], $$r[$EMAIL]]);

@contactRecords = sort {$$a[0] cmp $$b[0]} @contactRecords;
@contactRecords = grep($$_[1] eq "1", @contactRecords);

foreach my $r ( @contactRecords )
    print join("\t", @$r), "\n";

The punctuation and my’s make this harder to read than it should be.



import fileinput

records = []

for line in fileinput.input():
    record = [field.replace('"', '') for field in line.split("\t")]

EMAIL = 17

contactRecords=[[r[SKUTITLE], r[CONTACTME], r[EMAIL]] for r in records]
contactRecords.sort() # default sort will group by sku title
contactRecords = filter(lambda r: r[1] == "1", contactRecords)

for r in contactRecords:
    print "\t".join(r)

I think the Python version is generally the cleanest to read—that is, it’s the most English-like. I had to look up how join and filter worked, because they weren’t methods of list as I had guessed.



records = []

while gets
    record = $_.split('\t').collect! {|field| field.gsub('"', '') }
    records << record

EMAIL = 17

contactRecords=records.collect {|r| [r[SKUTITLE], r[CONTACTME], r[EMAIL]] }
contactRecords.sort! # default sort will group by sku title
contactRecords.reject! {|a| a[1] != "1"}

contactRecords.each {|r|
    print r.join("\t"), "\n"

14 responses to “Perl vs. Python vs. Ruby

  1. Tony Su 11/27/2010 at 5:45 pm

    Seems Python is very popular in back-end programming this days:)

  2. Gabor Szabo 11/27/2010 at 8:20 pm

    I think experiemnting with other languages or even switching most or all of your coding to other languages does not need to be justified. You could use the positive energies in there of “learning new things” and “having fun” instead of the negative ones “the other one is ugly”. Especially as it sound like repeating a “common wisdom” of the Python programmers while I am sure you can make up your mind without that.

    This code

    foreach my $r ( @records )
    push(@contactRecords, [$$r[$SKUTITLE], $$r[$CONTACTME], $$r[$EMAIL]]);

    could be written as

    foreach my $r ( @records )
    push(@contactRecords, [$r->[$SKUTITLE], $r->[$CONTACTME], $r->[$EMAIL]]);

    eliminating the awful $$ for deferencing or if you’d like to make it more similar to the other solutions you could write

    push @contactRecords, [$r->[$SKUTITLE], $r->[$CONTACTME], $r->[$EMAIL]] for my $r @records

    or even

    push @contactRecords, [$_->[$SKUTITLE], $_->[$CONTACTME], $_->[$EMAIL]] for @records


    push @contactRecords, map { [$_->[$SKUTITLE], $_->[$CONTACTME], $_->[$EMAIL]] } @records

    IMHO these are all much cleaner than the one you picked but that would not go well with the standard Python FUD that seemed to catch you as well.
    Of course I know, that the fact I gave 4 solutions here will trigger the other Python FUD about “there is too many ways to write it in Perl and we like to be restricted”.

    • jon 12/04/2010 at 5:57 pm

      Did you edit that first for loop in python in the last minute or two? If so good change, if not im trippin. You can put that entire for loop into a list comprehension (or map for that matter) if you care to. Not a huge deal, i just got care for functions that sign without assignment when i can help it (probably an unnecessary preference rooted in functional programming experience)

    • williamjxj 12/07/2010 at 1:20 pm

      I agree. I didn’t use $$, absolutely $r->[], $_ are better.

    • Andrew 03/16/2012 at 2:12 am

      That push statement could be written like so:

      push @contactRecords, \@$r($SKUTITLE, $CONTACTME, $EMAIL);

      Granted that if you’re not familiar with the idiom (and array slices are a much underused feature) you might need to think for a moment about what it means, but then I’ve never seen your perl idiom either. I *never* have reason to use “$$”. You might prefer [ ] rather than \ to produce the reference you are pushing.

      If I was doing it though, I’d change the algorithm a bit. I would have condensed the record earlier on, discarding unwanted info as soon as possible.

      foreach my $line ( )
      my @record = split(/\t/, $line);
      push(@records, [
      map {s/^”//; s/”$//}

      This saves running the map statement on values you don’t need, and also (less obviously) means that because you aren’t editing the values at all, perl doesn’t need to copy the strings, it can just refer to locations in the memory used by $line. I know your concern here is more about style than performance, but I thought I’d point it out anyway. Someone who doesn’t know the performance benefit won’t be hindered in reading the code.

      My map statement is also a bit different. It’s more efficient, more correct with regards to quotes within the field, and doesn’t have the redundant “; $_” at the end of the map block. No string copying is required by these regexes.

      I’m also explicitly constructing the array referred to by the reference I’m pushing onto @records. ie using the [ ] operator rather than \ . You are repeatedly pushing references to the same array, which I think failed with some earlier versions of perl, though testing it now with a modern version it seems OK.

  3. Richard C. 12/06/2010 at 4:12 am

    For me, the Ruby code looks the easiest and the most logical.

  4. camel 12/07/2010 at 1:14 am

    push(@contactRecords, [$$r[$SKUTITLE], $$r[$CONTACTME], $$r[$EMAIL]]);

    could be written as

    push @contactRecord, [ @$r[$SKUTITLE, CONTACTME, $EMAIL] ];

  5. Roman 12/07/2010 at 7:45 am

    Hello, maybe you can get a bit more concise perl code. Something like below, maybe

    use strict;
    use constant {
    EMAIL => 17,
    CONTACTME => 27,
    SKUTITLE => 34,

    my @contact_records =
    sort { $a->[0] cmp $b->[0] }
    grep { $_->[1] eq "1" }
    map {
    [ ( map { s/"//g; $_ } split "\t" )[SKUTITLE, CONTACTME, EMAIL] ]
    } ;

    for my $r (@contact_records) {
    print join("\t",@$r), "\n";

    By using constants you can get rid of sigils there. Also it is better to grep before sort, because sorting of less records will be faster.

    — Roman

  6. Roman 12/07/2010 at 7:46 am

    I should have used pre instead of code. Sorry about that.

    • williamjxj 12/07/2010 at 1:16 pm

      Yes, <pre> instead of <code> seems look better. Anyway, that doesn’t matter.

      Your Perl codes are super-implementation which use sort, map, and grep together. I like it and will have a try.

      • Roman 12/08/2010 at 2:06 am

        Glad you like it. In code this sort it does not matter, but for larger program you might want to make object for each record, so you get accessor for individual columns. I usually use core Class::Struct to build such simple classes.

  7. Robert S. 12/07/2010 at 10:18 am

    I stumbled upon this blog post and thought I’d write a more idiomatic Ruby (1.9) version. I hope you’ll find it interesting.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: