➜ spark-1.6.1-bin-hadoop2.6 IPYTHON=1 bin/pyspark Python 2.7.11+ (default, Apr 17 2016, 14:00:29) Type "copyright", "credits" or "license" for more information.
IPython 2.4.1 -- An enhanced Interactive Python. ? -> Introduction and overview of IPython's features. %quickref -> Quick reference. help -> Python's own help system. object? -> Details about 'object', use 'object??' for extra details. Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 16/06/21 23:57:06 INFO SparkContext: Running Spark version 1.6.1 16/06/21 23:57:06 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 1.6.1 /_/
Using Python version 2.7.11+ (default, Apr 17 2016 14:00:29) SparkContext available as sc, HiveContext available as sqlContext.
In [1]: lines = sc.textFile("README.md")
如果你觉得这样也有些麻烦,可以在~/.bashrc里面加一行:
1
alias ipython='IPYTHON=1'
然后就可以这么启动:
1 2
➜ ~ cd /usr/dev/spark-1.6.1-bin-hadoop2.6 ➜ spark-1.6.1-bin-hadoop2.6 ipython bin/pyspark
In [1]: lines = sc.textFile("README.md") n [2]: lines.count() 16/06/21 23:57:41 INFO FileInputFormat: Total input paths to process : 1 16/06/21 23:57:41 INFO SparkContext: Starting job: count at <ipython-input-2-44aeefde846d>:1 16/06/21 23:57:41 INFO DAGScheduler: Got job 0 (count at <ipython-input-2-44aeefde846d>:1) with 2 output partitions 16/06/21 23:57:41 INFO DAGScheduler: Final stage: ResultStage 0 (count at <ipython-input-2-44aeefde846d>:1) 16/06/21 23:57:41 INFO DAGScheduler: Parents of final stage: List() 16/06/21 23:57:41 INFO DAGScheduler: Missing parents: List() 16/06/21 23:57:41 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[2] at count at <ipython-input-2-44aeefde846d>:1), which has no missing parents 16/06/21 23:57:41 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 5.6 KB, free 173.1 KB) 16/06/21 23:57:41 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 3.4 KB, free 176.6 KB) 16/06/21 23:57:41 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:36609 (size: 3.4 KB, free: 511.5 MB) 16/06/21 23:57:41 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006 16/06/21 23:57:41 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (PythonRDD[2] at count at <ipython-input-2-44aeefde846d>:1) 16/06/21 23:57:41 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 16/06/21 23:57:41 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,PROCESS_LOCAL, 2151 bytes) 16/06/21 23:57:41 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, partition 1,PROCESS_LOCAL, 2151 bytes) 16/06/21 23:57:41 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) 16/06/21 23:57:41 INFO Executor: Running task 1.0 in stage 0.0 (TID 1) 16/06/21 23:57:41 INFO HadoopRDD: Input split: file:/usr/dev/spark-1.6.1-bin-hadoop2.6/README.md:0+1679 16/06/21 23:57:41 INFO HadoopRDD: Input split: file:/usr/dev/spark-1.6.1-bin-hadoop2.6/README.md:1679+1680 16/06/21 23:57:41 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id 16/06/21 23:57:41 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id 16/06/21 23:57:41 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap 16/06/21 23:57:41 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition 16/06/21 23:57:41 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id 16/06/21 23:57:42 INFO PythonRunner: Times: total = 283, boot = 268, init = 14, finish = 1 16/06/21 23:57:42 INFO PythonRunner: Times: total = 291, boot = 272, init = 19, finish = 0 16/06/21 23:57:42 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 2124 bytes result sent to driver 16/06/21 23:57:42 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 2124 bytes result sent to driver 16/06/21 23:57:42 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 407 ms on localhost (1/2) 16/06/21 23:57:42 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 393 ms on localhost (2/2) 16/06/21 23:57:42 INFO DAGScheduler: ResultStage 0 (count at <ipython-input-2-44aeefde846d>:1) finished in 0.423 s 16/06/21 23:57:42 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 16/06/21 23:57:42 INFO DAGScheduler: Job 0 finished: count at <ipython-input-2-44aeefde846d>:1, took 0.489031 s Out[2]: 95
In [3]: lines.first() 16/06/21 23:58:15 INFO SparkContext: Starting job: runJob at PythonRDD.scala:393 16/06/21 23:58:15 INFO DAGScheduler: Got job 1 (runJob at PythonRDD.scala:393) with 1 output partitions 16/06/21 23:58:15 INFO DAGScheduler: Final stage: ResultStage 1 (runJob at PythonRDD.scala:393) 16/06/21 23:58:15 INFO DAGScheduler: Parents of final stage: List() 16/06/21 23:58:15 INFO DAGScheduler: Missing parents: List() 16/06/21 23:58:15 INFO DAGScheduler: Submitting ResultStage 1 (PythonRDD[3] at RDD at PythonRDD.scala:43), which has no missing parents 16/06/21 23:58:15 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 4.8 KB, free 181.3 KB) 16/06/21 23:58:15 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 3.0 KB, free 184.3 KB) 16/06/21 23:58:15 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:36609 (size: 3.0 KB, free: 511.5 MB) 16/06/21 23:58:15 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1006 16/06/21 23:58:15 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (PythonRDD[3] at RDD at PythonRDD.scala:43) 16/06/21 23:58:15 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks 16/06/21 23:58:15 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, localhost, partition 0,PROCESS_LOCAL, 2151 bytes) 16/06/21 23:58:15 INFO Executor: Running task 0.0 in stage 1.0 (TID 2) 16/06/21 23:58:15 INFO HadoopRDD: Input split: file:/usr/dev/spark-1.6.1-bin-hadoop2.6/README.md:0+1679 16/06/21 23:58:15 INFO PythonRunner: Times: total = 41, boot = -33244, init = 33285, finish = 0 16/06/21 23:58:15 INFO Executor: Finished task 0.0 in stage 1.0 (TID 2). 2143 bytes result sent to driver 16/06/21 23:58:15 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 67 ms on localhost (1/1) 16/06/21 23:58:15 INFO DAGScheduler: ResultStage 1 (runJob at PythonRDD.scala:393) finished in 0.068 s 16/06/21 23:58:15 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 16/06/21 23:58:15 INFO DAGScheduler: Job 1 finished: runJob at PythonRDD.scala:393, took 0.082887 s Out[3]: u'# Apache Spark'
def main(): print '开始主线程:', ctime() threads = [] # 一个用于储存线程对象的列表 n_loops = range(len(loops)) for i in n_loops: t = threading.Thread(target=loop, args=(i, loops[i])) # 每次循环创建一个Thread的实例 threads.append(t) # 将新创建的对象放到一个列表中
for i in n_loops: threads[i].start() # 每次循环运行一个线程
or i in n_loops: threads[i].join() # 等待子线程的完成
print '主线程完成:', ctime()
if __name__ == '__main__': main()
输出结果为:
1 2 3 4 5 6 7 8
开始主线程: Sat Jun 18 19:47:40 2016 开始线程 开始线程0 于 Sat Jun 18 19:47:40 20161 于 Sat Jun 18 19:47:40 2016 loop函数 1 完成于 Sat Jun 18 19:47:42 2016 loop函数 0 完成于 Sat Jun 18 19:47:44 2016 主线程完成: Sat Jun 18 19:47:44 2016
def main(): print '开始主线程:', ctime() threads = [] # 一个用于储存线程对象的列表 n_loops = range(len(loops)) for i in n_loops: t = MyThread(func=loop, args=(i, loops[i])) # 使用我们自己的类来新建对象 threads.append(t) # 将新创建的对象放到一个列表中 for i in n_loops: threads[i].start() # 每次循环运行一个线程
def main(): print '开始主线程:', ctime() threads = [] # 一个用于储存线程对象的列表 n_loops = range(len(loops)) for i in n_loops: t = MyThread(func=loop, args=(i, loops[i])) # 使用我们自己的类来新建对象 threads.append(t) # 将新创建的对象放到一个列表中 for i in n_loops: threads[i].start() # 每次循环运行一个线程
for i in n_loops: threads[i].join() # 等待子线程的完成
for i in n_loops: print '执行结果为:', threads[i].result # 打印该对象的属性
print '主线程完成:', ctime()
if __name__ == '__main__': main()
执行结果为:
1 2 3 4 5 6
开始主线程: Sat Jun 18 20:11:51 2016 执行结果为: 4 执行结果为: 3 主线程完成: Sat Jun 18 20:11:51 2016
end time start:2016-06-19 18:08:12 time done:2016-06-19 18:08:12 100 time start:2016-06-19 18:08:12 time done:2016-06-19 18:08:12 101 time start:2016-06-19 18:08:12 time done:2016-06-19 18:08:12 102 time start:2016-06-19 18:08:14 time done:2016-06-19 18:08:14 103
if __name__ == '__main__': pool = multiprocess.Pool(2) result = []
for i in range(3): msg = 'hello %s' % i result.append(pool.apply_async(func=func, args=(msg,)))
pool.close() pool.join()
for res in result: print '***: %s' % res.get()
print 'end'
执行看一下返回结果是啥
1 2 3 4 5 6 7 8 9 10
time start:2016-06-19 18:43:22 time start:2016-06-19 18:43:22 time end:2016-06-19 18:43:24 time start:2016-06-19 18:43:24 time end:2016-06-19 18:43:24 time end:2016-06-19 18:43:26 ***: done hello 0 ***: done hello 1 ***: done hello 2 end
$ cat /usr/dev/idea-IU-139.1117.1/bin/idea.sh # --------------------------------------------------------------------- # Locate a JDK installation directory which will be used to run the IDE. # Try (in order): IDEA_JDK, JDK_HOME, JAVA_HOME, "java" in PATH. # --------------------------------------------------------------------- if [ -n "$IDEA_JDK" -a -x "$IDEA_JDK/bin/java" ]; then JDK="$IDEA_JDK" elif [ -n "$JDK_HOME" -a -x "$JDK_HOME/bin/java" ]; then JDK="$JDK_HOME" elif [ -n "$JAVA_HOME" -a -x "$JAVA_HOME/bin/java" ]; then JDK="$JAVA_HOME" else JAVA_BIN_PATH=`which java` .......
@app.route('/') def show_entries(): cur = g.db.execute('select title, text from entries order by id desc') entries = [dict(title=row[0], text=row[1]) for row in cur.fetchall()] return render_template('show_entries.html', entries=entries)