Primitive root mod p

Transire suum pectus mundoque potiri

Functional Programming in Python 26 April, 2008

Filed under: Programming — Nikolas Karalis @ 2:37 pm

Recently, I decided to learn how to use the functional programming tools provided by Python.

Truth is, it’s interestingly strong.

The whole concept consists of using 3 built-in functions (map, filter, reduce), lambda functions and of course the beloved List Comprehensions.

In order of appearance :

map (function, sequence) : It applies function on every item of the sequence.

You can even provide a function which takes two arguments and use it like this :

map (function, seq1, seq2)

filter (function, sequence) : It returns a sequence of the items which have a function (Item) == True value.

reduce (function, sequence) : It applies the function (2 arguments) to the first two items of the sequence, and then to the result and the third etc…

List comprehensions :

This a powerful feature of Python.

I will explain it by an example :

>>> a=range(10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

>>> [x**3 for x in a]
[0, 1, 8, 27, 64, 125, 216, 343, 512, 729]
Lambda functions :

With lambda functions you can create anonymous functions on the fly.

For example, the normal definition of a function is :

def f(n): return n**3

With lambda functions :

f = lambda n: n**3

In both cases, we use like this :

>>> f(5)
125
An example of using lambda functions with the map function would be like this :

>>> a=range(10)

>>> map(lambda x: x**3, a)
[0, 1, 8, 27, 64, 125, 216, 343, 512, 729

This is exactly the same as the example in list comprehensions.

So, with all the above we have a whole new range of tools for dealing with lists.

 

Seismology, Snow & Procrastination 18 February, 2008

Filed under: /dev/random, Programming — Nikolas Karalis @ 1:29 am

I haven’t updated my blog for a while. No specific reason… I was just procrastinating since exams were coming and I didn’t have much to say.

The latest news from Greece are the earthquakes and the snow (and of course my exams :-P ).

The last few days, 2 big earthquakes (6.2 R and 6.9 R) shaked Southern Greece and last night, it began snowing in Athens!

This happens once every one or two years, so it is big news

So last night, a small earthquake reminded me of the passionate discussions about the earthquakes and since i didn’t have much to do because of the snow and my constant last month procrastination, i decided to build a database of the Greek Earthquakes of the last few years.

So, a little bit of coding and a few hours later, I proudly (:P) present my Greek Seismological Database (click here to visit).

EDIT : A new addition : It is automatically updated when a new earthquake happens…

I end this post, with a few photos from Athens in white.

*******

*******

 

Thesis Database and Python CGI uploading. 28 January, 2008

Filed under: Programming — Nikolas Karalis @ 6:39 pm

Before a few days I had the idea that it would be really nice if we could have a database of greek theses and dissertations, about mathematics and science in general. From what i know, there are a few databases around, mostly for Electrical Engineering and Computer Science dissertations. So, I thought that it would be a good opportunity for me to exercise my CGI and Python Web scripting skills.

And here I am, presenting the Thesis Database project. I hope that it will be useful and people will contribute.

But while coding the CGI backbone, I had a few problems to solve, so since i had to come up with the solutions (couldn’t find anything useful online), i decided to post them here, for future reference. I will also give the basic idea of how a python cgi uploading script works. The focus is on security of the code.

So, the following is a very simple html form, which will be used as the user interface for the upload.

We suppose that the cgi script is called upload.py and is placed inside the $Web root$/cgi-bin/ directory.

upload.html

<html>

<head> <title>Upload Example</title> </head>

<body> <div align=”center”>

<form action=”/cgi-bin/upload.py” method=”POST” enctype=”multipart/form-data”>

File : <input name=”file” type=”file” size=”35″><BR>
<P><input name=”submit” type=”submit” value=”Upload”>

</div> </body> </html>

(more…)

 

Python Hacking 7 January, 2008

Filed under: Programming — Nikolas Karalis @ 5:43 am

Efficiency Tips

In this post, i will try to collect tips and tricks for efficient Python programming. Feel free to contribute.

I will update every once in a while, so check back…

  • Fastest string conversion :
>>> a=123
>>> `a` 
'123'
  • Fastest string concatenation :
>>> ''.join['a','b','c'] 

Efficiency comparison

Optimization Anecdote (Guido van Rossum)

  • Generator Functions
>>> a=[1,2,3]
>>> b=[x**2 for x in a] 
b=[1,4,9]
  • Timing a process :
from time import clock
t=clock()
... 
t=clock()-t

(more…)

 

Deleting duplicate lines from file 5 January, 2008

Filed under: Programming — Nikolas Karalis @ 10:27 pm

I’ve been fighting with a computational problem for many days, but i haven’t even come close to an acceptable solution.

The problem statement :

Given a file of n lines, return the index number of the duplicate lines (where index line is 0 for the first line, 1 for the 2nd etc.)

It may sound trivial, even silly, but i can assure you it is not. It can be relatively simple for small number of lines.

But as n raises, the problem becomes exponentially harder.

When i first faced the problem, i had to deal with a few hundred thousands of lines. So, i came up with a simple python code to do it.

When the first difficulties appeared, i came up with a faster solution.

  def duplicates(sequence):
	visited={}
	dupl=[]
	for x in sequence:
		if x in visited: dupl.append(x)
		visited[x]=1
	return dupl

However, now that i have to find the duplicate lines in files with 15 million. 26 million and more, it is impossible to use this code, since it returns memory errors.

So i found another idea, which is REALLY slow for now.

  1. Sort the file with the windows or unix command : sort
  2. Use unix command uniq -d, to get a list of the duplicated lines.
  3. Use unix command grep -n, on the unsorted file for every line in the previous list, to get which lines are duplicated.
  4. Use a simple python script to parse the result and get only the the integers we are interested in.

However, the grep part is REALLY slow for huge files. So, my problem remains. However, i reduced the problem from removing duplicated from a file, to simply getting fast the index of a line in a file or equivalently, fast iteration and comparison of the file lines.

After extended digging in the Internet, i was not able to find any efficient algorithm or implementation.