Как создать список случайных целочисленных векторов, сумма которых равна x
Создание случайного вектора, сумма которого равна X (например, X = 1000), довольно проста:
import random
def RunFloat():
Scalar = 1000
VectorSize = 30
RandomVector = [random.random() for i in range(VectorSize)]
RandomVectorSum = sum(RandomVector)
RandomVector = [Scalar*i/RandomVectorSum for i in RandomVector]
return RandomVector
RunFloat()
Приведенный выше код создает вектор, значения которого являются числами с плавающей точкой, а сумма равна 1000.
Мне трудно создать простую функцию для создания вектора, значения которого являются целыми числами, а сумма равна X (например, X = 1000 * 30)
import random
def RunInt():
LowerBound = 600
UpperBound = 1200
VectorSize = 30
RandomVector = [random.randint(LowerBound,UpperBound) for i in range(VectorSize)]
RandomVectorSum = 1000*30
#Sanity check that our RandomVectorSum is sensible/feasible
if LowerBound*VectorSize <= RandomVectorSum and RandomVectorSum <= UpperBound*VectorSum:
if sum(RandomVector) == RandomVectorSum:
return RandomVector
else:
RunInt()
У кого-нибудь есть предложения по улучшению этой идеи? Мой код может никогда не закончиться или столкнуться с проблемами глубины рекурсии.
Edit (July 9, 2012)Спасибо Оливеру, Мгилсону и Дугалу за их вклад. Мое решение показано ниже.
Oliver was very creative with the multinomial distribution idea Put simply, (1) is very likely to output certain solutions more so than others. Dougal demonstrated that the multinomial solution space distribution is not uniform or normal by a simple test/counter example of Law of Large Numbers. Dougal also suggested to use numpy's multinomial function which saves me a lot of trouble, pain, and headaches. To overcome (2)'s output issue, I use RunFloat() to give what appears (I haven't tested this so its just a superficial appearance) to be a more uniform distribution. How much of a difference does this make compared to (1)? I don't really know off-hand. It's good enough for my use though. Thanks again to mgilson for the alternative method that does not use numpy.Вот код, который я сделал для этого редактирования:
Edit #2 (July 11,2012)Я понял, что нормальное распределение не правильно реализовано, с тех пор я изменил его следующим образом:
import random
def RandFloats(Size):
Scalar = 1.0
VectorSize = Size
RandomVector = [random.random() for i in range(VectorSize)]
RandomVectorSum = sum(RandomVector)
RandomVector = [Scalar*i/RandomVectorSum for i in RandomVector]
return RandomVector
from numpy.random import multinomial
import math
def RandIntVec(ListSize, ListSumValue, Distribution='Normal'):
"""
Inputs:
ListSize = the size of the list to return
ListSumValue = The sum of list values
Distribution = can be 'uniform' for uniform distribution, 'normal' for a normal distribution ~ N(0,1) with +/- 5 sigma (default), or a list of size 'ListSize' or 'ListSize - 1' for an empirical (arbitrary) distribution. Probabilities of each of the p different outcomes. These should sum to 1 (however, the last element is always assumed to account for the remaining probability, as long as sum(pvals[:-1]) <= 1).
Output:
A list of random integers of length 'ListSize' whose sum is 'ListSumValue'.
"""
if type(Distribution) == list:
DistributionSize = len(Distribution)
if ListSize == DistributionSize or (ListSize-1) == DistributionSize:
Values = multinomial(ListSumValue,Distribution,size=1)
OutputValue = Values[0]
elif Distribution.lower() == 'uniform': #I do not recommend this!!!! I see that it is not as random (at least on my computer) as I had hoped
UniformDistro = [1/ListSize for i in range(ListSize)]
Values = multinomial(ListSumValue,UniformDistro,size=1)
OutputValue = Values[0]
elif Distribution.lower() == 'normal':
"""
Normal Distribution Construction....It's very flexible and hideous
Assume a +-3 sigma range. Warning, this may or may not be a suitable range for your implementation!
If one wishes to explore a different range, then changes the LowSigma and HighSigma values
"""
LowSigma = -3#-3 sigma
HighSigma = 3#+3 sigma
StepSize = 1/(float(ListSize) - 1)
ZValues = [(LowSigma * (1-i*StepSize) +(i*StepSize)*HighSigma) for i in range(int(ListSize))]
#Construction parameters for N(Mean,Variance) - Default is N(0,1)
Mean = 0
Var = 1
#NormalDistro= [self.NormalDistributionFunction(Mean, Var, x) for x in ZValues]
NormalDistro= list()
for i in range(len(ZValues)):
if i==0:
ERFCVAL = 0.5 * math.erfc(-ZValues[i]/math.sqrt(2))
NormalDistro.append(ERFCVAL)
elif i == len(ZValues) - 1:
ERFCVAL = NormalDistro[0]
NormalDistro.append(ERFCVAL)
else:
ERFCVAL1 = 0.5 * math.erfc(-ZValues[i]/math.sqrt(2))
ERFCVAL2 = 0.5 * math.erfc(-ZValues[i-1]/math.sqrt(2))
ERFCVAL = ERFCVAL1 - ERFCVAL2
NormalDistro.append(ERFCVAL)
#print "Normal Distribution sum = %f"%sum(NormalDistro)
Values = multinomial(ListSumValue,NormalDistro,size=1)
OutputValue = Values[0]
else:
raise ValueError ('Cannot create desired vector')
return OutputValue
else:
raise ValueError ('Cannot create desired vector')
return OutputValue
#Some Examples
ListSize = 4
ListSumValue = 12
for i in range(100):
print RandIntVec(ListSize, ListSumValue,Distribution=RandFloats(ListSize))
Код выше можно найти наGitHub, Это часть класса, который я построил для школы. user1149913, также выложил приятное объяснение проблемы.