Birth and Assassination Stochastic Process
Type of Activity
This is a multi-week project that could be used as an extended
homework assignment.
Outline of Scientific Context
The birth-and-assassination process is a variation of a birth and death process
in which the clock which counts down the time until a particles death
does not start ticking until the particle's parent dies.
This process was proposed by Aldous and Krebs.
The idea goes as follows:
- One individual is born at time 0.
- As soon as an individual is born, it starts to produce offspring
according to a Poisson (rate lambda) process.
- An individual cannot die before the time of its parent's death (time 0 for the initial individual).
- Once the parent dies, the individual lives for a further random time S , i.i.d. over individuals.
We consider the special case where the killing distribution is an
exponential distribution with intensity kappa.
The birth and assassination process is motivated by two applications:
- Database queuing system: This process is used to study the behavior of queuing system for a data base that blocks multiple queries from modifying the same entry simultaneously.
In this system, the arrival times of jobs follow a Poisson process, the service times are independent and identically distributed, and there are infinitely many
servers. A job is a descendant of another job is it is blocked by it (or is a descendant of a job blocked by it).
- Scotching a rumor in a network:
This process represents the dynamics of a rumor spreading on the vertices of a graph along its edges. A vertex may be unaware of the rumor, aware of the rumor and
spreading it as true, or aware of the rumor and trying to stop it.
(Charles Bordenave has shown it is equivalent to the birth and death process in an asymptotic sense.)
Computational Topics Covered
This assignment is a simulation study. It covers, random number generation,
computer experiments, response surface exploration, function writing,
profiling code, data structures.
Computational Learning Objectives
-
Function writing, including identifying subtasks and writing "helper" functions to
handle these tasks.
- Understand how to use computer experiments to explore the characteristics of
a process and aid in formalizing mathematical framework.
- Practice with figuring out how to use exploratory data analysis and diagnostics statistics on the function output to check code.
-
Practice in profiling and improving code.
- Recognize that the way one might describe a procedure to someone is not necessarily the best way to code it. In this case, the relationships between the Poisson, exponential and uniform probability distributions cans be used to write the simulation, or the Poisson process can be used.
- Simulation studies can help students understand random variables and processes.
Prerequisite Computational Topics
- Basic data structures
- Control flow
- Writing functions
Target Audience
Undergraduates who have had some probability. The project can be extended
for graduate students.
Breakdown of Computational Tasks
Solutions
Some information that you may find useful about the Poisson process:
-
A Poission(lambda) process on a fixed interval, say (0, t), will have a random number of hits in that interval, where the number of hits follows the Poisson(lambda * t) distribution. A hit in this case is the birth of a child.
-
The location of these hits follows the uniform distribution on (0,t).
One approach
- What are the main ideas?
- For a given individual, we need to generate his offspring, meaning their birthdays. To do this we see that we need to know when the individual dies as well as the rate for producing offspring.
- To figure out the lifespan of an individual, we need two pieces of
information: how long he lives after his father dies (this is a randomly generated time independent of all other information), and how old he is when his father dies.
The lifespan will be the sum of these two values.
The age at father's death depends on the birthdate and father's lifespan.
- Simple scripts to get started
- The above looks a bit circular, so let's start with the first individual and figure out his information. The lifespan is just lifespan = rexp(1, kappa). Now that we have the lifespan, we can generate the children's birthdates. We use the properties listed above, and first generate the number of children numKids = rpois(1, lifespan*lambda). Then their birthdates are uniformly distributed on the interval (0, lifespan).
- We extend to the second generation information. We have the father's lifespan and the individual's birthdates (actually the age of the father when the children are born). This is enough to generate the death dates for all of the children. Once we have the death dates, we generate the birthdates for the third generation, and so on.
- Wrap up the scripts into helper functions. Take the generation of birthdates for individual.
- What should the inputs be to the function?
- How should the information be returned?
- Repeat for the other sub tasks
-
Move on to the wrapper function:
The arguments for the function are specified as:
- lambda = birth rate
- kappa = assassination rate
- maxGen = the maximum number of generations (this is used to control the
length of the simulation)
The return value from the function is to be a list with one element per generation. Each element will be a data
frame, with have one row for each person born in that generation. The data frame will have
four variables, parentI D the identifier for the parent, childI D the identifier for the child
(a number from 1 to n, the total number of children born in that generation), birth date,
assination date.
Data structure enters the picture here when we fiugre out how to store this information. Each individual will have a different number of children and these children will have a different number of children. Also how to handle the case where there are no offspring needs to be dealt with.
- Debugging:
Although the code may run without producing errors, provide evidence that the simulation is functioning correctly. This evidence could be in the form of summary statistics, such as the average number of children each person has (over many runs of the
simulation), the average life span of a person in a generation, etc. or in the form of plots.
- Profiling and improving code.
- Now that the code is written, how might you better organize it for clarity, generality, and reusability?
- What information does profiling your code provide you to re-write your code more efficiently?
- Sample space exploration.
-
Run the simulation multiple times for various values of lambda and kappa to see if you can determine which values will lead to the family dying out.
-
What is the behaviior of the process for various values of lambda and kappa? Is there a transition point where the process goes from being stable (meaning that the process dies out before time t with certainty) to unstable?
Hint: It has been shown that 1/4 is the transition point, when lambda is less than 1/4 the process is stable.
-
Can you corroborate the following statement with your simulation output?
A scaling argument can be used to show that a birth-and-assassination process with intensities (lambda,kappa) and a birth-and-assassination process with intensities (lambda/kappa , 1) have the same distribution.
Cautionary Notes
The following are questions that students posted on the class discussion board.
Question: Can someone clarify what the "parentID" and "childID" refer to?
Does "parentID" refer to that person's parents? Or a number that refers to him/herself? If the latter, then should every parent have a different number?
Response:
All people in the same generation should have a unique id - an integer from 1 to the number of people in that generation is fine.
The row/record for a person should have a parent ID, the identifier for their parent, and a personID. So childID is a bit of a misnomer. It should be parentID and personID. Thes IDs need to be unique within a generation, but do not need to be unique across generations.
Question: Do I use rpois one time to determine the number of children, and a second time to determine the birthdays of each child?
I'm also not sure what I am supposed to use runif for.
Response:
rpois should give the number of kids and hence the number of birthdays you need to generate
I believe runif is used for the birthdays-
runif (number of birthdays you're generating, birth of parent, death of parent)
Question: Is it possible to coerce a list of numbers with varying row lengths into a numeric vector? I've managed to do this only by catenating each row of the list together (eg. if g is a list of 3 items -> x=c(g[[1]],g[[2]],g[[3]]) ) but I haven't figured out how to write that into a function.
Any tips?
Response:
Here are a couple of approaches:
unlist(g) - will turn a list into a vector
for (i in 1:length(g)) {
code to successively catenate each element of g into a new vector
}
Question: I am stuck as to how we are supposed to run various simulations for kappa and lambda. Are we supposed to use some sort of r random generator function to obtain values.
Response:
kappa and lambda should be arguments in your overall function. you can provide different values and see what happens to the birth and assassination process.
Question: So I've created the head of the family and I made another dataframe that generates his children and all their information (birthtime, death time, number, parent). Everything follows the right distributions (hopefully). The problem is that I have no idea how to create further generations, since I can't figure out a way to only index certain parent IDs and have them create offspring who create offspring.
Response:
What I did was I started with the first parent in generation 3 (assuming there is at least one parent). I found the number of kids the first parent had, each childs' parentID (which is one), their birth date, their assassination date, and their childID. Then, I used a for loop to do the same for any other parents in this generation and their kids (if they had any).
Plus lots of questions about control flow.
Ways to Extend the Topic
-
There are many possibilities for the birth-assassination process.
Sample from more complicated distributions, e.g. a mixture
model with different types of offspring, e.g. weak, moderate
and strong which have different expected life times;
or from a distribution for which there is no built-in
random generator function and so use RNG techniques such as
Inverse CDF or Acceptance-Rejection sampling.
- The student may want to experiment with recursion rather than
an iterative approach (or vice verse).
- Repeat the assignment where we re-write to (further) improve the structure,
break out subtasks into functions, and make code more efficient.
Other Directions
- One could find ways to represent the process using animation,
e.g.
- displaying the "family" tree;
- use SVG or Flash to animate the process.
Similar Exercises/Projects
-
- Ad hoc networks
-
-
- BML car model and phase transition.
-
-
- "Beat the CLT".
-
-
- Random Number Generation
-
Grading Suggestions
We have specified the function name, arguments, and output format in order to
test the code programmatically. In addition, we require students to writeup
their findings from the response surface exploration.