====== Python ====== ------------------------------------------------------------------------------------------------------------------------------------------------\\ 2020-12-26 Wireshark tcpdump to neo4j plot 5.7.2. The “Export Packet Dissections” Dialog Box This lets you save the packet list, packet details, and packet bytes as plain text, CSV, JSON, and other formats. From tshark -T json -r file.pcap tshark -T json -j "http tcp ip" -x -r file.pcap From TShark is a network protocol analyzer. It lets you capture packet data from a live network, or read packets from a previously saved capture file, either printing a decoded form of those packets to the standard output or writing the packets to a file. TShark's native capture file format is pcapng format, which is also the format used by wireshark and various other tools. From tshark.exe" -T json -j "http tcp ip" -r "\\SERVER\Db\Mc\br0-2020-07-16-17-40.txt" > "\\SERVER\Db\Mc\test.txt" ====== How to efficiently parse fixed width files? ====== 2020-08-15 Here's a way to do it with string slices, as you were considering but were concerned that it might get too ugly. The nice thing about it is, besides not being all that ugly, is that it works unchanged in both Python 2 and 3, as well as being able to handle Unicode strings. Speed-wise it is, of course, slower than the versions based the struct module, but could be sped-up slightly by removing the ability to have padding fields. try: from itertools import izip_longest # added in Py 2.6 except ImportError: from itertools import zip_longest as izip_longest # name change in Py 3.x try: from itertools import accumulate # added in Py 3.2 except ImportError: def accumulate(iterable): 'Return running totals (simplified version).' total = next(iterable) yield total for value in iterable: total += value yield total def make_parser(fieldwidths): cuts = tuple(cut for cut in accumulate(abs(fw) for fw in fieldwidths)) pads = tuple(fw < 0 for fw in fieldwidths) # bool values for padding fields flds = tuple(izip_longest(pads, (0,)+cuts, cuts))[:-1] # ignore final one parse = lambda line: tuple(line[i:j] for pad, i, j in flds if not pad) # optional informational function attributes parse.size = sum(abs(fw) for fw in fieldwidths) parse.fmtstring = ' '.join('{}{}'.format(abs(fw), 'x' if fw < 0 else 's') for fw in fieldwidths) return parse line = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789\n' fieldwidths = (2, -10, 24) # negative widths represent ignored padding fields parse = make_parser(fieldwidths) fields = parse(line) print('format: {!r}, rec size: {} chars'.format(parse.fmtstring, parse.size)) print('fields: {}'.format(fields)) Output: format: '2s 10x 24s', rec size: 36 chars fields: ('AB', 'MNOPQRSTUVWXYZ0123456789') From: ====== Python 3 always stores text strings as sequences of Unicode code points. ====== 2020-08-08 Python 3 always stores text strings as sequences of Unicode code points. These are values in the range 0-0x10FFFF. They don’t always correspond directly to the characters you read on your screen, but that distinction doesn’t matter for most text manipulation tasks. From ====== UCS-2 is UTF-16 ====== 2020-08-08 UCS-2 is UTF-16, really, for any codepoint that was assigned when it was still called UCS-2 in any case. Open it with encoding='utf16'. If there is no BOM (the Byte order mark, 2 bytes at the start, for BE that'd be \xfe\xff), then use encoding='utf_16_be' to force a byte order. From There is a useful package in Python - chardet, which helps to detect the encoding used in your file. Actually there is no program that can say with 100% confidence which encoding was used - that's why chardet gives the encoding with the highest probability the file was encoded with. Chardet can detect following encodings: • ASCII, UTF-8, UTF-16 (2 variants), UTF-32 (4 variants) • Big5, GB2312, EUC-TW, HZ-GB-2312, ISO-2022-CN (Traditional and Simplified Chinese) • EUC-JP, SHIFT_JIS, CP932, ISO-2022-JP (Japanese) • EUC-KR, ISO-2022-KR (Korean) • KOI8-R, MacCyrillic, IBM855, IBM866, ISO-8859-5, windows-1251 (Cyrillic) • ISO-8859-2, windows-1250 (Hungarian) • ISO-8859-5, windows-1251 (Bulgarian) • windows-1252 (English) • ISO-8859-7, windows-1253 (Greek) • ISO-8859-8, windows-1255 (Visual and Logical Hebrew) • TIS-620 (Thai) From You can install chardet with a pip command: pip install chardet mport chardet rawdata = open(file, "rb").read() result = chardet.detect(rawdata) charenc = result['encoding'] From From ====== Atrribute errno ====== 2020-07-16 Atrribute errno is defined only in OSError and classes inheriting from it. So apparently line 88 is part of try...except clause and in that line you're trying to use e.errno. You can't do that if the exception doesn't belong to OSError exceptions family. From ====== Cclasses Make an empty file called __init__.py ====== 2020-07-05 Cclasses Make an empty file called __init__.py in the same directory as the files. That will signify to Python that it's "ok to import from this directory". From Same as previous, but prefix the module name with a . if not using a subdirectory: from .user import User from .dir import Dir From Python 3.3+ has Implicit Namespace Packages that allow it to create a packages without an __init__.py file. Allowing implicit namespace packages means that the requirement to provide an __init__.py file can be dropped completely, and affected . From PEP 420 -- Implicit Namespace Packages From ====== Python RegEx ====== 2020-07-04 Python RegEx ❮ PreviousNext ❯ A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern. RegEx can be used to check if a string contains the specified search pattern. RegEx Module Python has a built-in package called re, which can be used to work with Regular Expressions. Import the re module: From ====== Discovering millions of datasets ====== 2020-04-02 RAW DATA https://datasetsearch.research.google.com/ Discovering millions of datasets on the web Natasha Noy Research Scientist, Google Research Published Jan 23, 2020 Across the web, there are millions of datasets about nearly any subject that interests you. If you’re looking to buy a puppy, you could find datasets compiling complaints of puppy buyers or studies on puppy cognition. Or if you like skiing, you could find data on revenue of ski resorts or injury rates and participation numbers. Dataset Search has indexed almost 25 million of these datasets, giving you a single place to search for datasets and find links to where the data is. Over the past year, people have tried it out and provided feedback, and now Dataset Search is officially out of beta. From ====== Python Cheatsheet ====== 2020-04-01 Comprehensive Python Cheatsheet Contents     1. Collections:   List, Dictionary, Set, Tuple, Range, Enumerate, Iterator, Generator.     2. Types:            Type, String, Regular_Exp, Format, Numbers, Combinatorics, Datetime.     3. Syntax:           Args, Inline, Closure, Decorator, Class, Duck_Type, Enum, Exception.     4. System:          Exit, Print, Input, Command_Line_Arguments, Open, Path, OS_Commands.     5. Data:               JSON, Pickle, CSV, SQLite, Bytes, Struct, Array, Memory_View, Deque.     6. Advanced:     Threading, Operator, Introspection, Metaprograming, Eval, Coroutines.     7. Libraries:        Progress_Bar, Plot, Table, Curses, Logging, Scraping, Web, Profile,                                   NumPy, Image, Audio, Pygame. From ====== Illustrated Guide to Python 3 ====== 2020-01-19 Illustrated Guide to Python 3: A Complete Walkthrough of Beginning Python with Unique Illustrations Showing how Python Really Works. Now covering Python 3.6 (Treading on Python) (Volume 1) 2nd Edition From ====== retrieve all groups for a specific domain ====== 2019-09--17 Retrieve all groups for a domain or the account To retrieve all groups for a specific domain or the account, use the following GET request and include the authorization described in Authorize requests. For the query strings, request, and response properties, see the API Reference. For readability, this example uses line returns: GET https://www.googleapis.com/admin/directory/v1/groups?domain=domain name &customer=my_customer or customerId&pageToken=pagination token &maxResults=max results When retrieving: • All groups for a sub-domain — Use the domain argument with the domain's name. • All groups for the account — Use the customer argument with either my_customer or the account's customerIdvalue. As an account administrator, use the string my_customer to represent your account's customerId. If you are a reseller accessing a resold customer's account, use the resold account's customerId. For the customerIdvalue use the account's primary domain name in the Retrieve all users in a domain operation's request. The resulting response has the customerId value. • Using both domain and customer arguments — The API returns all the groups for the domain. • Not using the domain and customer arguments — The API returns all the groups for the account associated with my_customer. This is the account customerId of the administrator making the API request. From ====== APIs & Services ====== APIs & Services Google API Dashboard From API s Explorer Learn more about using the Groups Settings API by reading the documentation. From Google API Client This is the Python client library for Google's discovery based APIs. To get started, please see the docs folder. These client libraries are officially supported by Google. However, the libraries are considered complete and are in maintenance mode. This means that we will address critical bugs and security issues but will not add any new features. Installation To install, simply use pip or easy_install: pip install --upgrade google-api-python-client From Groups Settings API Lets you manage permission levels and related settings of a group. Documentation for the Groups Settings API in PyDoc. samples/groupssettings Sample for the Groups Settings API From ====== Algorithms ====== 2019-08-28 Algorithms by Jeff Erickson 🔥1st edition, June 2019 🔥 (Amazon links: US, UK, DE, ES, FR, IT, JP) This web page contains a free electronic version of my self-published textbook Algorithms, along with other lecture notes I have written for various theoretical computer science classes at the University of Illinois, Urbana-Champaign since 1998. From ====== You can still miss attachments ====== 2019-08-23 You can still miss attachments by following @Ilya V. Schurov or @Cam T answers, the reason is because the email structure can be different based on the mimeType. From Gmail API: where to find body of email depending of mimeType From • Now with this service you can read your emails and read any attachments you may have in your e-mails • First you can query your e-mails with a search string to find the e-mail ids you need that have the attachments: search_query = "ABCD" result = service.users().messages().list(userId='me', q=search_query).execute() msgs = results['messages') msg_ids = [msg['id'] for msg in msgs] • now for each messageId you can find the associated attachments in the email. From payload.headers[] list List of headers on this message part. For the top-level message part, representing the entire message payload, it will contain the standard RFC 2822 email headers such as To, From, and Subject. From headers=messageheader["payload"]["headers"] subject= [i['value'] for i in headers if i["name"]=="Subject"] From ====== the Gmail API ====== the Gmail API Complete the steps described in the rest of this page to create a simple Python command-line application that makes requests to the Gmail API. From Download Attachments from gmail using Gmail API Remove all special characters, punctuation and spaces from string Example 3 import re re.sub('\W+','', string) • string1 - Result: 3.11899876595 • string2 - Result: 2.78014397621 From ====== Access Dates ====== Access Dates and then access the data using a loop: for msg in msgs['messages']: m_id = msg['id'] # get id of individual message message = service.users().messages().get(userId='me', id=m_id).execute() payload = message['payload'] header = payload['headers'] for item in header: if item['name'] == 'Date': date = item['value'] ** DATA STORAGE FUNCTIONS ETC ** From Python's strftime directives Note: Examples are based on datetime.datetime(2013, 9, 30, 7, 6, 5) From ====== Sort a Dictionary ====== 2019-08-22 Python : How to Sort a Dictionary by key or Value ? From ====== Plotly Cufflinks ====== 2019-08-15 Interactive Plots with Plotly and Cufflinks on Pandas Dataframes A simple and easy introduction to interactive visualisation with Plotly in python. Ozan Oct 8, 2018 · 4 min read Pandas is one of the the most preferred and widely used tools in Python for data analysis. It also has it’s own sample build-in plot function. Hovewer when it comes to interactive visualization, Python users face some difficulties if they haven’t front-end engineer skills since lots of library such as D3, chart.js requires some javascript background. This is where that Plotly and Cufflinks come handy. From ====== Parsing text with Python ====== 2019-07-11 Parsing text with Python 2018-01-07 · 2966 words · 14 minute read python programming · parsing · python I hate parsing files, but it is something that I have had to do at the start of nearly every project. Parsing is not easy, and it can be a stumbling block for beginners. However, once you become comfortable with parsing files, you never have to worry about that part of the problem. From ====== DataFrames ====== 2019-08-14 Pandas Tutorial: DataFrames in Python Explore data analysis with Python. Pandas DataFrames make manipulating your data easy, from selecting or replacing columns and indices to reshaping your data. From ====== Backblaze ====== 2019-7-11 B2 python SDK Backblaze This repository contains a client library and a few handy utilities for easy access to all of the capabilities of B2 Cloud Storage. B2 command-line tool is an example of how it can be used to provide command-line access to the B2 service, but there are many possible applications (including FUSE filesystems, storage backend drivers for backup applications etc). From Backblaze is making two new APIs available that integrators and customers have been asking for: copy_file and copy_part. Together, the new functionality makes it easier to work with large files and to copy and manipulate files directly in B2. From ====== Plotly ====== 2019-07-01 import plotly.plotly as py import plotly.graph_objs as go data = [ go.Scatter( x=[1, 2], y=[1, 2] ) ] layout = go.Layout( xaxis=dict( autorange='reversed' ) ) fig = go.Figure(data=data, layout=layout) py.iplot(fig, filename='axes-reversed') From ====== Regular Expression ====== 2019-05-29 ValidIpAddressRegex = "^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$"; From \b(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.   (25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.   (25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.   (25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\b From More efficient than re.findall() is re.finditer(regex, subject). It returns an iterator that enables you to loop over the regex matches in the subject string: for m in re.finditer(regex, subject). The for-loop variable m is a Match object with the details of the current match. From RegexMagic: Regular Expression Generator From ====== Split the string at the last occurrence of sep ====== 2019-05-23 str.rpartition(sep) Split the string at the last occurrence of sep, and return a 3-tuple containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return a 3-tuple containing two empty strings, followed by the string itself. From ====== The built-in os module has a number of useful functions ====== The built-in os module has a number of useful functions that can be used to list directory contents and filter the results. To get a list of all the files and folders in a particular directory in the filesystem, use os.listdir() in legacy versions of Python or os.scandir() in Python 3.x. os.scandir() is the preferred method to use if you also want to get file and directory properties such as file size and modification date. From ====== Splitting, Concatenating, and Joining Strings in Python ====== 2019-05-20 Splitting, Concatenating, and Joining Strings in Python From ====== Regex Testor ====== Regex Testor https://regex101.com/ ====== processdokuwikifile ====== 2019-05-15 def processdokuwikifile(in_file,par_out_file): """lkjshflkjlsk""" #with open('C:\\Users\\An\\Desktop\\GoTo\\Listing2018-10-25-01-31-.txt','w') as outlog: with open(par_out_file,'w',encoding='utf-8') as out_file: out = csv.writer(out_file) #with open('C:\\Users\\An\\Desktop\\GoTo\\Search2018-10-25-01-31-.txt','r') as log: #with open('C:\\Users\\An\\Desktop\\GoTo\\Search2018-10-25-01-31-.txt','r') as log: with open(in_file,'r',encoding='utf-8') as infile: 2019-05-07 For reference, the slide deck that I use to present on this topic is available here. All of the code and the sample text that I use is available in my Github repo here. • Why parse files? • The big picture • Parsing text in standard format • Parsing text using string methods • Parsing text in complex format using regular expressions • Step 1: Understand the input format • Step 2: Import the required packages • Step 3: Define regular expressions • Step 4: Write a line parser • Step 5: Write a file parser • Step 6: Test the parser • Is this the best solution? • Conclusion From ====== PASS BY OBJECT REFERENCE (Case in python): ====== 2019-04-08 PASS BY OBJECT REFERENCE (Case in python): Here, "Object references are passed by value." def append_one(li): li.append(1) x = [0] append_one(x) print x Here, the statement x = [0] makes a variable x (box) that points towards the object [0] On the function being called, a new box li is created. The contents of li is the SAME as the contents of box x. Both the boxes contain the same object. That is, both the variables point to the same object in memory. Hence, any change to the object pointed at by li will also be reflected by the object pointed at by x. In conclusion, the output of the above program will be: [0, 1] Note: If the variable li is reassigned in the function, then li will point to a seperate object in memory. x however, will continue pointing to the same object in memory it was pointing to earlier. Example: def append_one(li): li = [0, 1] x = [0] append_one(x) print x The output of the program will be: [0] From ====== Plotly ====== 2019-03-31 Plotly 1962_2006_walmart_store_openings.csv Update 1962_2006_walmart_store_openings.csv 4 years ago 2010_alcohol_consumption_by_country.csv Create 2010_alcohol_consumption_by_country.csv 3 years ago 2011_february_aa_flight_paths.csv Create 2011_february_aa_flight_paths.csv 4 years ago 2011_february_us_airport_traffic.csv Create 2011_february_us_airport_traffic.csv 4 years ago From ====== Python write to CSV ====== 2019-03-29 Python write to CSV import csv with open(..., 'wb') as myfile: wr = csv.writer(myfile, quoting=csv.QUOTE_ALL) wr.writerow(mylist) From with open(iniFile.absolute(), 'w', newline='') as iniSettings: #spamwriter = csv.writer(iniSettings, delimiter=',', # quotechar='"', quoting=csv.QUOTE_MINIMAL) #spamwriter = csv.writer(iniSettings, delimiter=',', # quotechar='"') spamwriter = csv.writer(iniSettings) #spamwriter.writerow(folder_list) folder_list.insert(0,rotation_list) for val in folder_list: spamwriter.writerow(val) you can also use wr.writerows(list) – tovmeod Dec 25 '11 at 22:29 • 4 Writerows seems to break up each element in the list into columns if each element is a list as well. This is pretty handy for outputting tables. – whatnick Oct 7 '14 at 5:22 From ====== CSV in Python adding an extra carriage return, on Windows ====== CSV in Python adding an extra carriage return, on Windows One of the possible fixes in Python3, as described in @YiboYang's answer, is opening the file with the newline parameter set to be an empty string: f = open(path_to_file, 'w', newline='') writer = csv.writer(f) From ====== Examples of simple type checking in Python: ====== 2019-02-17 Examples of simple type checking in Python: assert type(variable_name) == int assert type(variable_name) == bool assert type(variable_name) == list From Use type >>> type(one) You can use the __name__ attribute to get the name of the object. (This is one of the few special attributes that you need to use the __dunder__ name to get to - there's not even a method for it in the inspect module.) >>> type(one).__name__ 'int' From ====== isinstance() ====== With one argument, return the type of an object. The return value is a type object. The isinstance() built-in function is recommended for testing From • Syntax: isinstance(object, classinfo) The isinstance() takes two parameters: object : object to be checked classinfo : class, type, or tuple of classes and types From ====== graph-cli ====== 2019-01-05 graph-cli A CLI utility to create graphs from CSV files. graph-cli is designed to be highly configurable for easy and detailed graph generation. It has many flags to acquire this detail and uses reasonable defaults to avoid bothering the user. It also leverages chaining, so you can create complex graphs from multiple CSV files. From ====== copy2 ====== 2018-12-25 copy2 As with the previous methods, copy2 method is identical to the copy method, but in addition to copying the file contents it also attempts to preserve all the source file's metadata. If the platform doesn't allow for full metadata saving, then copy2 doesn't return failure and it will just preserve any metadata it can. The syntax is as follows: shutil.copy2(src_file, dest_file, *, follow_symlinks=True) From ====== Start of String Only: \A ====== Start of String Only: \A The \A anchor specifies that a match must occur at the beginning of the input string. It is identical to the ^ anchor, except that \A ignores the RegexOptions.Multiline option. Therefore, it can only match the start of the first line in a multiline input string. From ====== Decimals interact well with much of the rest of Python ====== decimal — Decimal fixed point and floating point arithmetic From Unlike hardware based binary floating point, the decimal module has a user alterable precision (defaulting to 28 places) which can be as large as needed for a given problem: >>> from decimal import * >>> getcontext().prec = 6 >>> Decimal(1) / Decimal(7) Decimal('0.142857') >>> getcontext().prec = 28 >>> Decimal(1) / Decimal(7) Decimal('0.1428571428571428571428571429') From Decimals interact well with much of the rest of Python. Here is a small decimal floating point flying circus: >>> data = list(map(Decimal, '1.34 1.87 3.45 2.35 1.00 0.03 9.25'.split())) >>> max(data) Decimal('9.25') >>> min(data) Decimal('0.03') >>> sorted(data) [Decimal('0.03'), Decimal('1.00'), Decimal('1.34'), Decimal('1.87'), Decimal('2.35'), Decimal('3.45'), Decimal('9.25')] From ====== splitting a number into the integer and decimal parts ====== splitting a number into the integer and decimal parts >>> a = 147.234 >>> a % 1 0.23400000000000887 >>> a // 1 147.0 >>> If you want the integer part as an integer and not a float, use int(a//1) instead. To obtain the tuple in a single passage: (int(a//1), a%1) EDIT: Remember that the decimal part of a float number is approximate, so if you want to represent it as a human would do, you need to use the decimal library From import math x = 1234.5678 math.modf(x) # (0.5678000000000338, 1234.0) From Create a date object: import datetime x = datetime.datetime(2020, 5, 17) From ====== Module datetime provides ====== Module datetime provides classes for manipulating date and time in more object oriented way. One of them is datetime.datetime.now which return number of seconds since the epoch. import datetime; ts = datetime.datetime.now().timestamp() print(ts) # 1545665588.52 From x = int(datetime.datetime(2070, 12, 13, 1, 48, 35).timestamp() - datetime.datetime.now().timestamp()//1) print(x) ====== Example 2: Right justify string and fill the remaining spaces ====== Example 2: Right justify string and fill the remaining spaces # example string string = 'cat' width = 5 fillchar = '*' # print right justified string print(string.rjust(width, fillchar)) From ====== Practical Business Python ====== 2018-12-24 Practical Business Python pbpython/extras/Pathlib-Cheatsheet.pdf From ====== The divmod() returns ====== 2018-12-23 The divmod() returns • (q, r) - a pair of numbers (a tuple) consisting of quotient q and remainder r From ====== numpy ====== 2018-11-18 pip install numpy From pip3.6 install numpy pip3.6 install scipy pip3.6 install matplotlib pip3.6 install opencv Install opencv-python instead of cv2. pip install opencv-python From ====== compare the use of lambda ====== We can compare the use of lambda with that of def to create a function. adder_lambda = lambda parameter1,parameter2: parameter1+parameter2 def adder_regular(parameter1, parameter2): return parameter1+parameter2 From ====== Key Functions ====== Key Functions Both list.sort() and sorted() have a key parameter to specify a function to be called on each list element prior to making comparisons. For example, here’s a case-insensitive string comparison: >>> sorted("This is a test string from Andrew".split(), key=str.lower) ['a', 'Andrew', 'from', 'is', 'string', 'test', 'This'] The value of the key parameter should be a function that takes a single argument and returns a key to use for sorting purposes. This technique is fast because the key function is called exactly once for each input record. From This image was created with the following code. 1 import operator                                                   2 import pylab 3 from easydev import Timer 4   5 times1, times2, times3, times4 = [], [], [], [] 6 pylab.clf() 7 d = {"Pierre": 42, "Anne": 33, "Zoe": 24} 8 for j in range(20): 9     N = 1000000 10     with Timer(times3): 11         for i in range(N): 12          sorted_d = sorted((key, value) for (key,value) in d.items()) 13     with Timer(times2): 14         for i in range(N): 15             sorted_d = sorted(d.items(), key=lambda x: x[1]) 16     with Timer(times1): 17         for i in range(N): 18             sorted_d = sorted(d.items(), key=operator.itemgetter(1)) 19     with Timer(times4): 20         for i in range(N): 21             sorted_d = [(k,v) for k,v in d.items()] 22     print(j) 23 pylab.boxplot([times1, times2, times3, times4]) 24 pylab.xticks([1,2,3,4], ["operator", "lambda", "list comprehension and lambda", "py36"]) 25 pylab.ylabel("Time (seconds) 1 million sorting \n (repeated 20 times)") 26 pylab.grid() 27 pylab.title("Performance sorted dictionary by values") From As already said, iteritems() will be a problem, but you mention a syntax error, which comes from the lambda declaration with parenthesis: Change: key=lambda(k, v): sort_order.index(k) To: key=lambda k, v: sort_order.index(k) From ====== What problem does pandas solve? ====== 2018-11-15 What problem does pandas solve? Python has long been great for data munging and preparation, but less so for data analysis and modeling. pandas helps fill this gap, enabling you to carry out your entire data analysis workflow in Python without having to switch to a more domain specific language like R. From NumPy NumPy is the fundamental package for scientific computing with Python. It contains among other things: • a powerful N-dimensional array object • sophisticated (broadcasting) functions • tools for integrating C/C++ and Fortran code • useful linear algebra, Fourier transform, and random number capabilities From Array Broadcasting Broadcasting is the name given to the method that NumPy uses to allow array arithmetic between arrays with a different shape or size. From ====== scikit-learn ====== scikit-learn Machine Learning in Python • Simple and efficient tools for data mining and data analysis • Accessible to everybody, and reusable in various contexts • Built on NumPy, SciPy, and matplotlib • Open source, commercially usable - BSD license From Welcome to PyBrain PyBrain is a modular Machine Learning Library for Python. Its goal is to offer flexible, easy-to-use yet still powerful algorithms for Machine Learning Tasks and a variety of predefined environments to test and compare your algorithms. From ====== python read fails on special characters ====== 2018-11-06 python read fails on special characters with io.open(fileToSearch,'r',encoding='utf-8') as file: From An unrelated hint: have a look at the built-in function enumerate, which frees you from taking care of incrementing counter: You simply write for counter, line in enumerate(file): From ====== idle args ====== 2018-10-31 A number of IDEs support menu options to set the execution environment for programs under development and testing. In particular, it would be nice if IDLE let the user set command line arguments to be passed into sys.argv when running a script by pressing F5. Here are some existing implementations for reference: * Wing-IDE: https://wingware.com/doc/intro/tutorial-debugging-launch * Visual Studio: https://www.youtube.com/watch?v=IgbQCRHKV-Y * PyCharm: https://www.jetbrains.com/pycharm/help/run-debug-configuration-python.html This feature will help users interactively develop and test command-line tools while retaining all the nice features of the IDE. I would personally find it useful when teaching students about how sys.argv works. From Pending application of a patch, the following will work to only add args to sys.argv when running from an Idle editor. import sys # ... if __name__ == '__main__': if 'idlelib.PyShell' in sys.modules: sys.argv.extend(('a', '-2')) # add your argments here. print(sys.argv) # in use, parse sys.argv after extending it # ['C:\\Programs\\python34\\tem.py', 'a', '-2'] From try: __file__ except: sys.argv = [sys.argv[0], 'argument1', 'argument2', 'argument2'] From ====== Auto detect IDLE and prompt for command-line argument values ====== 2018-10-31 Auto detect IDLE and prompt for command-line argument values c#! /usr/bin/env python3 import sys def ok(x=None): sys.argv.extend(e.get().split()) root.destroy() if 'idlelib.rpc' in sys.modules: import tkinter as tk root = tk.Tk() tk.Label(root, text="Command-line Arguments:").pack() e = tk.Entry(root) e.pack(padx=5) tk.Button(root, text="OK", command=ok, default=tk.ACTIVE).pack(pady=5) root.bind("", ok) root.bind("", lambda x: root.destroy()) e.focus() From ====== print the files deleted ====== 2018-10-30 Python Script Here's a Python script that will also print the files deleted import os for line in open("./data/deleted.files"): if line.isspace() or line[0] == '#': continue line = line.rstrip(os.linesep) try: if os.path.exists(line): print('File removed => ' + line) os.remove(line) except OSError: pass ====== delete directories ====== Here's an alternative Python script that is case sensitive and will also delete directories included in the list import os import shutil   def exists_casesensitive(path): if not os.path.exists(path): return False directory, filename = os.path.split(path) return filename in os.listdir(directory)   with open("./data/deleted.files") as file: for line in file: line = line.strip() if line and not line.startswith('#'): path = line.rstrip(os.linesep) if exists_casesensitive(path): if os.path.isdir(path): shutil.rmtree(path) print('Directory removed => ' + path) else: os.remove(path) print('File removed => ' + path) else: #print('File not found => ' + path) pass From ====== checkpoints ====== 2018-10-06 GLOB def delete_previous_checkpoints(self, num_previous=5): """ Deletes all previous checkpoints that are before the present checkpoint. This is done to prevent blowing out of memory due to too many checkpoints :param num_previous: :return: """ self.present_checkpoints = glob.glob(self.get_checkpoint_location() + '/*.ckpt') if len(self.present_checkpoints) > num_previous: present_ids = [self.__get_id(ckpt) for ckpt in self.present_checkpoints] present_ids.sort() ids_2_delete = present_ids[0:len(present_ids) - num_previous] for ckpt_id in ids_2_delete: ckpt_file_nm = self.get_checkpoint_location() + '/model_' + str(ckpt_id) + '.ckpt' os.remove(ckpt_file_nm) From ====== argparse ====== 2018-10-06 If you're doing anything more complicated than a script that accepts a few required positional arguments, you'll want to use a parser. Depending on your python version, there are 3 available in the python standard library (getopt, optparse and argparse) and argparse is by far the best. From ====== Argparse Tutorial ====== Argparse Tutorial author: Tshepang Lekhonkhobe This tutorial is intended to be a gentle introduction to argparse, the recommended command-line parsing module in the Python standard library. From *args and **kwargs in Python *args The special syntax *args in function definitions in python is used to pass a variable number of arguments to a function. It is used to pass a non-keyworded, variable-length argument list. From ====== recursive ====== 2018-10-03 In Python 3.5 and newer use the new recursive **/ functionality: configfiles = glob.glob('C:/Users/sam/Desktop/file1/**/*.txt', recursive=True) When recursive is set, ** followed by a path separator matches 0 or more subdirectories. From I have successfully used for i in d.rglob('**/*'): for i in d.iglob('**/*'): The “**” pattern means “this directory and all subdirectories, recursively”. In other words, it enables recursive globbing: From errno.ENOTEMPTY Directory not empty From errno.EACCES¶ Permission denied From except OSError as e: if e.errno not in _IGNORED_ERROS: raise return False From except OSError as e: if e.errno != EINVAL and strict: raise From ====== walktree ====== 2018-10-01 import os, sys from stat import * def walktree(top, callback): '''recursively descend the directory tree rooted at top, calling the callback function for each regular file''' for f in os.listdir(top): pathname = os.path.join(top, f) mode = os.stat(pathname).st_mode if S_ISDIR(mode): # It's a directory, recurse into it walktree(pathname, callback) elif S_ISREG(mode): # It's a file, call the callback function callback(pathname) else: # Unknown file type, print a message print('Skipping %s' % pathname) def visitfile(file): print('visiting', file) if __name__ == '__main__': walktree(sys.argv[1], visitfile) From ====== Dropbox in python ====== 2018-09-29 Dropbox in python from pathlib import Path import arrow filesPath = r"C:\scratch\removeThem" criticalTime = arrow.now().shift(hours=+5).shift(days=-7) for item in Path(filesPath).glob('*'): if item.is_file(): print (str(item.absolute())) itemTime = arrow.get(item.stat().st_mtime) if itemTime < criticalTime: #remove it pass From In IDLE, go to Options -> Configure IDLE -> Keys and there select history-next and then history-previous to change the keys. Then click on Get New Keys for Selection and you are ready to choose whatever key combination you want. From ====== CSV Toolkit Overview ====== 2018-09-04 CSV Toolkit Overview NOTE: THIS PROJECT HAS SINCE BEEN FORKED TO THE INTERNAL PROMETHEUS RESEACH, LLC TOOL PROPS.CSVTOOLKIT CSV Toolkit is a Python package that provides validation tooling and processing of CSV files. The validation tooling is based on the fantastic package Vladiate. The interface and extension mechanisms are similarly implemented as the rex.core extension mechanisms. From ====== What is Bonobo? ====== What is Bonobo? Bonobo is a lightweight Extract-Transform-Load (ETL) framework for Python 3.5+. It provides tools for building data transformation pipelines, using plain python primitives, and executing them in parallel. Bonobo is the swiss army knife for everyday's data. From csvkit is a suite of command-line tools for converting to and working with CSV, the king of tabular file formats. From ====== Awesome Python ====== Awesome Python  A curated list of awesome Python frameworks, libraries, software and resources. Inspired by awesome-php. • Awesome Python ○ Admin Panels ○ Algorithms and Design Patterns ○ Anti-spam ○ Asset Management ○ Audio ○ Authentication ○ Build Tools ○ AND MANY MORE From ====== Python data visualization: Comparing 7 tools ====== 2018-09-04 Python data visualization: Comparing 7 tools The Python scientific stack is fairly mature, and there are libraries for a variety of use cases, including machine learning, and data analysis. Data visualization is an important part of being able to explore data and communicate results, but has lagged a bit behind other tools such as R in the past. From ====== Best way to sort txt file using csv tools in python ====== 2018-09-04 Best way to sort txt file using csv tools in python From import csv import operator #==========Search by ID number. Return Just the Name Fields for the Student with open("studentinfo.txt","r") as f: studentfileReader=csv.reader(f) id=input("Enter Id:") for row in studentfileReader: for field in row: if field==id: currentindex=row.index(id) print(row[currentindex+1]+" "+row[currentindex+2]) #=========Sort by Last Name with open("studentinfo.txt","r") as f: studentfileReader=csv.reader(f) sortedlist=sorted(f,key=operator.itemgetter(0),reverse=True) print(sortedlist) From 2018-08-10 The sys.path list contains the list of directories which will be searched for modules at runtime: python -v >>> import sys >>> sys.path ['', '/usr/local/lib/python25.zip', '/usr/local/lib/python2.5', ... ] From For speedtest - /usr/local/lib