In taking Vivek Ramachandran’s course on Python for Pentesting, in lecture 13 he deals with the subject of processes. Personally I feel he jumps into a more intermediate/advanced topic from where we were in the previous lectures.
For that reason of his jump in difficulty, I pulled some information from various sources to help digest what he’s teaching here.
What is a Process vs. a Thread
What exactly is a process? This is briefly described on this post on Stack Overflow and I’ve quoted it below:
The threading module uses threads, the multiprocessing uses processes. The difference is that threads run in the same memory space, while processes have separate memory. This makes it a bit harder to share objects between processes with multiprocessing. Since threads use the same memory, precautions have to be taken or two threads will write to the same memory at the same time. This is what the global interpreter lock is for. –Sjored
Another SO user (Jeremy Brown) offers a more detailed explanation but is a bit beyond the scope of what I’m trying to learn here.
As you can see above, each has their own benefits and cons. Threads run multiple actions in shared memory and processes use separate memory for each action being spawned. The latter makes it more difficult to share objects between actions.
Processes are also geared for CPU intensive tasks.
An example provided on SO details where processes makes sense, which describes a task that makes use of processes to make use of a multi-core CPU.
There is more detail on this at Eli Bendersky’s blog. Eli states that:
In a previous post on Python threads, I briefly mentioned that threads are unsuitable for CPU-bound tasks, and multiprocessing should be used instead.
Eli provides a script to show benchmarking to prove his points.
So now we know that Processes are CPU efficient (esp. multiple cores/cpu’s) but they don’t share the same memory space so objects and such can’t be passed between processes. Processes would seem therefore to be about separating CPU task functions spread across multiple processors.
With processes it starts with the parent. You have a function, we’ll call the parent function.
Fork creates a duplicate of the process that calls it. For the life of me I couldn’t understand it… nor why. In my limited understanding I thought that we’d have a function that might do something like “execute ls” on the file system. So if we fork it… are we just running that same command multiple times? What’s the point?
Well I didn’t get it. It took some research to find a real example… (most examples just print out some text in the child and the parent…) it didn’t express to me any concept of utility. How is this useful?
The child can run it’s own functionality, separate from the parent. That’s where I was confused. I thought that the parent / child relationship was a copy of functionality. But see this example from Peter Collingridge’s blog, that he uses one process to write to a file and another process to do another write command to a different file.
Python Process Forking Example
Using what Peter Collingridge explained in his blog, I created my own version of a fork scenario – in this case the little python script launches multiple servers binding to different sockets. I used Vivek’s example server code (echo’ing user input back to them) – You’ll notice that each process echo’s only the connection within that process. The parent process gets all data coming in, but only echo’s appropriately to each connection.
The script itself will spawn two servers. Assume 1 connection from a client to each server… the script will show the input received from all connections (to each server), but the echo back, only echo’s to the client connecting to the specific server…. sound confusing? yeah I know. I dont know how else to describe it… but maybe it makes more sense with pictures.
import socket import os def create_server(host,port): tcp_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM) # tcp_socket.setsockopt(socket.SOL_SOCKET, socket.SOL_SOCKET, 1) tcp_socket.bind((host, port)) tcp_socket.listen(4) print "Waiting for a client connection:" (client, (ip, sock)) = tcp_socket.accept() print "Received connection from: ", ip print "Starting Echo output" data = 'dummy' while len(data): data = client.recv(2048) print "Client data", data client.send(data) print "Closing connection ..." client.close() pid = os.fork() if pid > 0: print "Child process has created a server listening on port 8887" create_server("127.0.0.1",8887) else: print "Parent process has created a server listening on port 7778" create_server("127.0.0.1", 7778) ''' Reference Materials: 1) http://www.pentesteracademy.com/video?id=13 (Vivek's lecture on processes) 2) http://www.petercollingridge.co.uk/blog/running-multiple-processes-python (simple fork example) 3) http://www.pentesteracademy.com/video?id=18 (Vivek's code & lecture on creating servers) '''
As we can see the script I manufactured defines a function called create_server, which takes two params for host and port. Then in the function I repurpose the code example from Vivek Ramachandran’s lecture on creating servers.
However the forking I’m doing is a much simpler (IMO) approach offered in Peter’s blog. Simply I create an instance of os.fork() assigning it to “pid.”
If PID is not 0 we’re in the child process of the fork. But if PID is 0 then it’s the parent process.
In either case, I’m creating a server and binding to a socket.
The main script runs in console. I open 2 more consoles and connect to each port (using netcat.)
Console 1 is the script spawning the two servers (one binding to port 7778 and port 8887.)
Console 2 is running netcat to port 7778, sending some text that gets echo’d back
Console 3 is running netcat on port 8887, sending some text and it gets that specific text echo’d back
Each session is separate from the other. In other words, when console 2 sends “Who am i” it is echo’d back only to that process. When console 3 sends “i see,” it similarly is echo’d back only to that instance.
Each fork of a process has it’s own PID. The parent will have a PID… which will be different from the forked PID. To get the PID, you simply call os.getpid() from the process.