Your Local Tautologist

Tuesday, May 28, 2013

The Monty Hall Problem and a Simulation in R

The Monty Hall problem has possibly the most counter-intuitive result in probability. An in-depth description is here, but the set-up is easy. On the game show "Let's Make a Deal", the contestant would be given the choice of three doors, behind one of which was a new car. Monty would show the contestant what was behind one of the other doors. Often it would be a goat or some other non-prize. The key point was that, after showing the contestant the non-prize door, Monty would offer the contestant the chance to switch his choice of doors to the remaining unopened door.

At first glance, there doesn't seem to be any reason for the contestant to switch. Now that there are only two doors left, he has a 50/50 shot of having the car behind his original door, right? Nope. It turns out that he should absolutely switch.

Why? The best way I've found to think about it like this... You pick a door, then Monty gives you the chance to trade your door for both of the other doors. In that case it's clear you should switch. That's essentially what is happening in the game, except that Monty is showing you what's behind one of the two other doors ahead of time. Each door has a 1/3 chance of having a car behind it, so, of course, you want two doors instead of one.

Since this is such an interesting problem with a simple set-up, I though I would write a simulation in R to verify that switching really did give the contestant a 2/3 chance of winning a car.

monty.hall.sim <- function(trials=50000){

    #Returns win percentage of those who switch

    car.location <- sample(c('A','B','C'), trials, replace = TRUE)
    player.pick <- sample(c('A','B','C'), trials, replace = TRUE)
    door.shown <- rep(0,trials)
    door.switched.to <- rep(0,trials)

    for(i in 1:trials){
        all.doors <- c('A', 'B', 'C')
        available.doors <- all.doors[!all.doors %in% c(car.location[i], player.pick[i])]
    door.shown[i] <- sample(available.doors,1)
        available.door.for.switch <- all.doors[!all.doors %in% c(player.pick[i], door.shown[i])]
        door.switched.to[i] <- available.door.for.switch
    }

    player.wins.originally <- as.numeric(car.location == player.pick)
    player.wins.after.switch <- as.numeric(car.location == door.switched.to)

    #For checking
    #MH.df <- data.frame(car.location, player.pick, door.shown, door.switched.to, player.wins.originally, player.wins.after.switch)

    nonswitching.win.pct <- sum(player.wins.originally) / trials
    switching.win.pct <- sum(player.wins.after.switch) / trials

    return(switching.win.pct)
}

I wrote this as a function so that I could time it. There are faster ways of doing it, but this works for now.

Friday, February 15, 2013

Monte Carlo simulation of exoplanet transit light curves in R

I've posted about the Kepler mission before, but this is a different approach. While it's easy to find all sorts of animations and diagrams showing how the light from a star is blocked over time as a planet transits in front of it, I got to thinking how those "ideal' light curves are found. I figured it was probably possible with some sort of convolution, but I wasn't really sure. Finding a function to determine the area of an intersection of two moving circles didn't sound like a lot of fun anyways. But, I thought it would be interesting to attack it as a simulation.

Since I've been fooling around with R lately, that's what I used. My code is below. It is a very simplified model, simulating one flat disk transiting another. The code produces random points in a box around the "planet" and discards the ones not within the planet's radius. Those points are then moved to the appropriate place in the transit across the "star", as determined by the step in the loop. The points are checked to see if they lay within the star's radius, and that count is divided by the total number of points to determine the proportion of the planet's area that is intersecting the star. That area is then divided by the star's area to find how much of the star's light is blocked. Those proportions are collected at each point of the planet's trip and can then a simulated light curve can be charted.

The ratio of planet to star radius can be changed, as can the crossing latitude. The simulation parameters of number of points generated and transit step length can also be changed. Again, this is nothing fancy and there are probably much better ways of doing this, but it was a fun project to chew on.

star.radius <- 1 # Conceptually, it's easier if this is 1
planet.radius <- .20

star.crossing.lat <- 88
lat.radians <- star.crossing.lat * ((2 * pi) / 360)
star.crossing.y <- star.radius * sin(lat.radians)

step.length <- .001
planet.x.pos <- seq((-2 * star.radius), (2 * star.radius), step.length) # Position of center of planet position, in terms of stellar radius.
steps <- as.integer(length(planet.x.pos))

point.count <- 25000
box.point.count <- ceiling(point.count * (4 / pi) * 1.1) # Can be wasteful

light.curve <- vector(length = steps, mode = "numeric")

counter <- 1
for(i in planet.x.pos){

#Generate poins on planet
box.points_df <- data.frame(runif(box.point.count, min = -planet.radius, max = planet.radius), runif(box.point.count, min = -planet.radius, max = planet.radius))
colnames(box.points_df) <- c('x', 'y')
box.points_df$r <- sqrt(box.points_df$x ^ 2 + box.points_df$y ^ 2)
box.points_df$in.circle <- box.points_df$r <= planet.radius
circle.points_df <- box.points_df[box.points_df$in.circle == TRUE, ]
circle.points_df <- circle.points_df[1:point.count, ]

#Move planet points into position
circle.points_df$x <- circle.points_df$x + i
circle.points_df$y <- circle.points_df$y + star.crossing.y

#Check if points intersect with star
planet.point.check <- sqrt((circle.points_df$x) ^ 2 + (circle.points_df$y) ^ 2)
intersect.points <- sum(planet.point.check < star.radius)

#Calculate relative flux
pct.planet.intersect <- intersect.points / nrow(circle.points_df)
area.intersect <- pct.planet.intersect * pi * planet.radius ^ 2
relative.flux <- ((pi * star.radius ^ 2) - area.intersect) / (pi * star.radius ^ 2)
light.curve[counter] <- relative.flux

counter <- counter + 1
}

And here is a sample of the output for a transit for the parameters in the code above.

Tuesday, November 27, 2012

A slightly broken A. G. Russell FeatherLite One Hand Knife

Recently, I switched to using an A. G. Russell FeatherLite One Hand Knife as my everyday carry knife. After a long period with the CRKT M16, I wanted a change of pace. I had had the A. G. Russell for a while, but never put it to much use.

In general, it was a good knife. Lighter than the M16, it felt a bit 'flimsy', but it always worked. In fact, the lighter weight (1.5 oz according to A. G. Russell) and rounded corners and edges made it more comfortable to carry. However, the measures taken to reach that light weight may have compromised it.

After using it one day to open some envelopes, I found that I could not easily close it. The locking mechanism includes a small plastic tab to allow one to pull the lock back to release the blade. On my knife, that tab had broken off. It can be closed without that tab with some extra effort, but that leaves a significant, sharp steel corner on the knife.

Note the missing plastic tab behind the blade.

The tab itself is a fairly small piece of plastic (Zytel, I'm guessing). I found it broken in two pieces in my pocket. It must have finally broken in my pocket sometime after the last time I had used it. I don't know how prevalent this mode of failure is for this knife, but I would guess I'm not the first one to experience it. The tab broke around the screw hole, where a lot of the stress would be place when using the tab.

Closeup of lock release without tab.

Broken tab pieces

That being said, I have always had good experiences with A. G. Russel's customer service. I'm going to ship this knife back and I expect that they will probably either replace, refund, or otherwise compensate for this defect.

Update: After getting in touch with them, they offered to either repair it or mail me a replacement tab. I chose the replacement tab.

Tuesday, September 11, 2012

Knife review: CRKT M16-10KZ

For my first review of, well, anything on this blog, I present the Columbia River Knife and Tool M16-10KZ. This knife has been my every day carry knife for the past year or so. I ran across it at a store and picked it up on a whim. At maybe $20 or $30 at the time, it was not a huge investment to try this knife out.

My well-used CRKT M16-10KZ. Yes, a thumb stud is missing here. I blame the washing machine.

Key features:

Pocketable size
Glass fiber reinforced nylon handle
Single-handed opening
Liner lock with safety

First, it's about 4 inches closed, making it easy to carry in a pocket without the clip. I don't like advertising the fact I have a knife, so this is a major plus. Also, this is not a wide knife, making it even easier to pocket.

I keep this knife in the same pocket as my keys. The reinforced nylon handle holds up well to the wear that it experiences there. I've also dropped it countless times and the handle is none the worse for it.

Maybe more than anything else, the ability to open it with a single hand is a great convenience. With a small flick of the flipper or the thumb stud, it will open and lock without issue. Closing it one-handed it a little more complicated but becomes easy after a couple minutes of practice.

The safety on the lock (called AutoLAWKS according to CRKT) is excellent. I've always been a little leery of liner lock knives when cutting something that takes significant effort. I get nervous that, when applying a lot of force, I may inadvertently force the liner lock open and close the knife on my hand. On the M16, the safety is on the opposite side of the knife, making it very difficult to operate both while cutting.

Conclusion:

Great EDC knife for the price.

Wednesday, August 29, 2012

Finding prime factors with Pollard's rho and Rabin-Miller algorithms in Python

Like I've said in a previous post, I have a thing for prime numbers. Factoring falls along that same line. After I while I thought it would be interesting to write some code to return the prime factors of an input number, mostly because I wanted to see how fast my old machine would able to do it.

I decided on Pollard's rho algorithm because it is easy to code and a whole lot faster than trial division. This would also need a primality (or compositeness) test, so since I had already coded one for my prime number generator, I stuck with Rabin-Miller. There are some trial divisions in there, but it's just to catch the small factors.

The hardest part of this, for me, was coming up with a way for Python to handle the list of intermediate factors. The way it's coded works, but I am sure it could be done better. That actually applies for the rest of it as well. An earlier version of this tore though the 8th Fermat number in a reasonable amount of time, so this works for me.

Maybe I'll update this with a different factoring algorithm and significantly cleaner code as a follow-up project.

Here's the code:

import math
import random
import fractions

################################################

def removerepeats(alist):

   alist.sort()

   for element in alist:
       while alist.count(element)>1:
           alist.remove(element)

   return(alist)

################################################

def milrabprimality(totest):

   if totest==1:
       return(-1)

   smallprimes=[2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101,103,107,109,113,127,131,137,139,149,151,157,163,167,173,179,181,191,193,197,199,211]

   if totest in smallprimes:
       return(1)

   for smtest in smallprimes:
       if totest%smtest==0 and totest!=smtest:
           return(-1)

   s=totest-1
   twos=0
   while s%2==0:       #Figures out the seed and powers of two
       s=s//2
       twos+=1

   eventest=totest-1   #Make even -> eventest=(2^r)*s

   trials=10           #Change if you want to

   agen=0
   a=[]
   while agen<trials:
       newa=random.randint(5,25)
       if newa not in a:
           a.append(newa)
           agen+=1

   trial1count=0
   totprime=0

   while trial1count<trials:       #Miller-Rabin test happens here
       primecount=0
       part1=0

       #Part 1
       res1=pow(a[trial1count],s,totest)
       if res1==1:
           primecount=1
           part1=1

       #Part2
       twopow=twos-1
       itercount=0
       if part1==0:               #Only do this loop if part 1 was unsuccessful
           while twopow>=(0):
               res2=pow(a[trial1count],s*(2**twopow),totest)
               twopow-=1
               itercount+=1
               if res2==(totest-1):
                   primecount+=1
                   break

       if primecount!=0:
           totprime+=1
       else:
           return(-1)       #This ends it early since it has to pass for all a

       trial1count+=1       #increments the random base

   if totprime==trial1count:
       return(1)

################################################

def pollardwrapper(number):

   factors=[]
   nonprimefactors=[]
   factors.append(number)

   if milrabprimality(number)==1:       #Just return it if prime
       return(factors)

   for factor in factors:
       if milrabprimality(factor)!=1:
           result1=pollardsrho(factor)
           result2=factor//result1
           factors.append(result1)
           factors.append(result2)
           nonprimefactors.append(factor)

   for element in nonprimefactors:
       factors.remove(element)

   factors.sort()
   factors=removerepeats(factors)

   return(factors)

################################################
def pollardsrho(number):

   if number==1:
       return(1)
   if number==2:
       return(2)
   if number==3:
       return(3)
   if number==4:
       return(2)

   prime=milrabprimality(number)
   if prime==1:
       return(number)

   #Algorithm from the Crandall and Pomerance book
   a=random.randint(1,number-3)
   s=random.randint(0,number-1)

   u=s
   v=s
   g=1

   while g==1:
       u=((u**2)+a)%number
       v=((v**2)+a)%number
       v=((v**2)+a)%number

       g=fractions.gcd((u-v),number)   #If g!=1 then g is a factor (unless it is equal to number).

       if g==number:                   #The bad seed case
           a=random.randint(1,number-3)
           s=random.randint(0,number-1)
           u=s
           v=s
           g=1

       if g!=1:
           return(g)   #End of algorithm

################################################

tofactor=int(input('\nWhat number do you want to factor? '))
pollardfactors=pollardwrapper(tofactor)
print('The prime factors of ',tofactor,'are',pollardfactors)

Monday, August 13, 2012

Autosquiggle

I had little to do today, so I wrote come code to doodle for me:

I'm not sure why you would want to do this, but here's my code in case you want to try it out. Play with the parameters to make different squiggles.

import matplotlib.pyplot as plt
import numpy as np

#######################################################################
def randangle(prevangle,variance):
    nextangle=np.random.normal(prevangle,variance)
    return(nextangle)
########################################################################
def thetagenerator(count):
    alltheangles=[]
    variance=3.0                #Can change this
    alltheangles.append(10.0)    #Can change this
    angcount=1
    
    while angcount<count:
        newangle=randangle(alltheangles[angcount-1],variance)
        if newangle>360:
            newangle=newangle%360
        alltheangles.append(newangle)
        angcount+=1

    return(alltheangles)
#######################################################################
        
totpoints=5000                    #Can change this
anglecount=0
x=[]
y=[]
x.append(0)        #First point
y.append(0)        #First point
radtodeg=360.0/(2*3.14159265359)
degtorad=1.0/radtodeg

segmentlength=0.01                #Can change this

angles=thetagenerator(totpoints)

points=1    #First point at (0,0)

while points<totpoints:
    currentangle=sum(angles[:points])%360
    
    x.append(x[points-1]+(segmentlength*np.cos(degtorad*currentangle)))
    y.append(y[points-1]+(segmentlength*np.sin(degtorad*currentangle)))
    
    points+=1
    
fig1=plt.figure(figsize=(10,10))    #Can change this
sub1=fig1.add_subplot(111,frameon=False,aspect="equal")
sub1.tick_params(axis='both',which='both',length=0, width=0, labelbottom=False,labeltop=False,labelleft=False,labelright=False)
sub1.plot(x,y)
plt.show()

Wednesday, August 8, 2012

Viewing Kepler Light Curves via Matplotlib and PyFITS

With all the attention that the Curiosity rover is getting, I thought I'd take a look back at the Kepler mission. The objective of Kepler is, essentially, to discover planets outside our own Solar System. The Kepler instrument is a space-based, wide-view telescope/photometer (see here for details).

It observes over 100,000 stars in Cygnus and Lyra, recording their brightness over time. If a star has a planet and its orbit is aligned in the line of sight, the planet will pass across the face of the star, partially blocking its light. The Kepler instrument is sensitive enough to detect these changes for planets of sufficient size.

The public has access to the data generated by the project. The "meat" of the data are the light curves. These are available via a web interface here, but I wanted to see if I could get the same curves from the raw data myself.

The raw data is available in FITS-formatted files. FITS is a standard format used for image data but can contain non-image data and metadata as well. For Kepler, the light curves are stored as electron fluxes read from the sensor over each observation. Higher electron fluxes are brighter observations, and vice versa.

It turns out that there's at least one module that makes dealing with FITS files a lot easier: PyFITS. Using PyFITS, I was able to put together a simple program to show a light curve stored in a given file. Once you have this much, it can be a starting point to analyze the data from that source and from other sources. My code is below:

import matplotlib.pyplot as plt
import pyfits

filename=raw_input("\nWhat is the FITS file name? ")
FITSfile=pyfits.open(filename)

dataheader=FITSfile[1].header
topheader=FITSfile[0].header
jdadj=str(dataheader["BJDREFI"])
obsobject=str(dataheader["OBJECT"])

lightdata=FITSfile[1].data

flux=lightdata.field("PDCSAP_FLUX")     #Start with the PDCSAP_FLUX if
                                        #you're not sure which one to use.
                                        #This is the corrected electron
                                        #flux (electrons/sec) in the
                                        #optimal aperture after correction.

time=lightdata.field("TIME")            #Barycenter corrected Julian date

fig1=plt.figure()
sub1=fig1.add_subplot(111)
sub1.plot(time,flux,color="black",marker=",",linestyle="None")
xlab="Time (days, Julian date - %s)"%jdadj
sub1.set_xlabel(xlab)
sub1.set_ylabel("Brightness (electron flux)")
plottitle="Light Curve for %s"%obsobject
sub1.set_title(plottitle)

plt.show()

FITSfile.close()

Here is some sample output:

This is a light curve for KIC 8191672, a.k.a. Kepler-5. The planet Kepler-5b orbits this star with a period of about 3.55 days and is roughly twice as massive as Jupiter. The spots where the reading drops significantly from the average are the times when the planet is transiting across the face of the star, blocking a bit of the star's light. See here and here for more details on Kepler-5b.

The raw data by sources can be found here, listed by Kepler ID.