====== Python ======
------------------------------------------------------------------------------------------------------------------------------------------------\\
2020-12-26
Wireshark tcpdump to neo4j plot
5.7.2. The “Export Packet Dissections” Dialog Box
This lets you save the packet list, packet details, and packet bytes as plain text, CSV, JSON, and other formats.
From
tshark -T json -r file.pcap
tshark -T json -j "http tcp ip" -x -r file.pcap
From
TShark is a network protocol analyzer. It lets you capture packet data from a live network, or read packets from a previously saved capture file, either printing a decoded form of those packets to the standard output or writing the packets to a file. TShark's native capture file format is pcapng format, which is also the format used by wireshark and various other tools.
From
tshark.exe" -T json -j "http tcp ip" -r "\\SERVER\Db\Mc\br0-2020-07-16-17-40.txt" > "\\SERVER\Db\Mc\test.txt"
====== How to efficiently parse fixed width files? ======
2020-08-15
Here's a way to do it with string slices, as you were considering but were concerned that it might get too ugly. The nice thing about it is, besides not being all that ugly, is that it works unchanged in both Python 2 and 3, as well as being able to handle Unicode strings. Speed-wise it is, of course, slower than the versions based the struct module, but could be sped-up slightly by removing the ability to have padding fields.
try:
from itertools import izip_longest # added in Py 2.6
except ImportError:
from itertools import zip_longest as izip_longest # name change in Py 3.x
try:
from itertools import accumulate # added in Py 3.2
except ImportError:
def accumulate(iterable):
'Return running totals (simplified version).'
total = next(iterable)
yield total
for value in iterable:
total += value
yield total
def make_parser(fieldwidths):
cuts = tuple(cut for cut in accumulate(abs(fw) for fw in fieldwidths))
pads = tuple(fw < 0 for fw in fieldwidths) # bool values for padding fields
flds = tuple(izip_longest(pads, (0,)+cuts, cuts))[:-1] # ignore final one
parse = lambda line: tuple(line[i:j] for pad, i, j in flds if not pad)
# optional informational function attributes
parse.size = sum(abs(fw) for fw in fieldwidths)
parse.fmtstring = ' '.join('{}{}'.format(abs(fw), 'x' if fw < 0 else 's')
for fw in fieldwidths)
return parse
line = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789\n'
fieldwidths = (2, -10, 24) # negative widths represent ignored padding fields
parse = make_parser(fieldwidths)
fields = parse(line)
print('format: {!r}, rec size: {} chars'.format(parse.fmtstring, parse.size))
print('fields: {}'.format(fields))
Output:
format: '2s 10x 24s', rec size: 36 chars
fields: ('AB', 'MNOPQRSTUVWXYZ0123456789')
From:
====== Python 3 always stores text strings as sequences of Unicode code points. ======
2020-08-08
Python 3 always stores text strings as sequences of Unicode code points. These are values in the range 0-0x10FFFF. They don’t always correspond directly to the characters you read on your screen, but that distinction doesn’t matter for most text manipulation tasks.
From
====== UCS-2 is UTF-16 ======
2020-08-08
UCS-2 is UTF-16, really, for any codepoint that was assigned when it was still called UCS-2 in any case.
Open it with encoding='utf16'. If there is no BOM (the Byte order mark, 2 bytes at the start, for BE that'd be \xfe\xff), then use encoding='utf_16_be' to force a byte order.
From
There is a useful package in Python - chardet, which helps to detect the encoding used in your file. Actually there is no program that can say with 100% confidence which encoding was used - that's why chardet gives the encoding with the highest probability the file was encoded with. Chardet can detect following encodings:
• ASCII, UTF-8, UTF-16 (2 variants), UTF-32 (4 variants)
• Big5, GB2312, EUC-TW, HZ-GB-2312, ISO-2022-CN (Traditional and Simplified Chinese)
• EUC-JP, SHIFT_JIS, CP932, ISO-2022-JP (Japanese)
• EUC-KR, ISO-2022-KR (Korean)
• KOI8-R, MacCyrillic, IBM855, IBM866, ISO-8859-5, windows-1251 (Cyrillic)
• ISO-8859-2, windows-1250 (Hungarian)
• ISO-8859-5, windows-1251 (Bulgarian)
• windows-1252 (English)
• ISO-8859-7, windows-1253 (Greek)
• ISO-8859-8, windows-1255 (Visual and Logical Hebrew)
• TIS-620 (Thai)
From
You can install chardet with a pip command:
pip install chardet
mport chardet
rawdata = open(file, "rb").read()
result = chardet.detect(rawdata)
charenc = result['encoding']
From
From
====== Atrribute errno ======
2020-07-16
Atrribute errno is defined only in OSError and classes inheriting from it.
So apparently line 88 is part of try...except clause and in that line you're trying to use e.errno. You can't do that if the exception doesn't belong to OSError exceptions family.
From
====== Cclasses Make an empty file called __init__.py ======
2020-07-05
Cclasses
Make an empty file called __init__.py in the same directory as the files. That will signify to Python that it's "ok to import from this directory".
From
Same as previous, but prefix the module name with a . if not using a subdirectory:
from .user import User
from .dir import Dir
From
Python 3.3+ has Implicit Namespace Packages that allow it to create a packages without an __init__.py file.
Allowing implicit namespace packages means that the requirement to provide an __init__.py file can be dropped completely, and affected .
From
PEP 420 -- Implicit Namespace Packages
From
====== Python RegEx ======
2020-07-04
Python RegEx
❮ PreviousNext ❯
A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern.
RegEx can be used to check if a string contains the specified search pattern.
RegEx Module
Python has a built-in package called re, which can be used to work with Regular Expressions.
Import the re module:
From
====== Discovering millions of datasets ======
2020-04-02
RAW DATA
https://datasetsearch.research.google.com/
Discovering millions of datasets on the web
Natasha Noy
Research Scientist, Google Research
Published Jan 23, 2020
Across the web, there are millions of datasets about nearly any subject that interests you. If you’re looking to buy a puppy, you could find datasets compiling complaints of puppy buyers or studies on puppy cognition. Or if you like skiing, you could find data on revenue of ski resorts or injury rates and participation numbers. Dataset Search has indexed almost 25 million of these datasets, giving you a single place to search for datasets and find links to where the data is. Over the past year, people have tried it out and provided feedback, and now Dataset Search is officially out of beta.
From
====== Python Cheatsheet ======
2020-04-01
Comprehensive Python Cheatsheet
Contents
1. Collections: List, Dictionary, Set, Tuple, Range, Enumerate, Iterator, Generator.
2. Types: Type, String, Regular_Exp, Format, Numbers, Combinatorics, Datetime.
3. Syntax: Args, Inline, Closure, Decorator, Class, Duck_Type, Enum, Exception.
4. System: Exit, Print, Input, Command_Line_Arguments, Open, Path, OS_Commands.
5. Data: JSON, Pickle, CSV, SQLite, Bytes, Struct, Array, Memory_View, Deque.
6. Advanced: Threading, Operator, Introspection, Metaprograming, Eval, Coroutines.
7. Libraries: Progress_Bar, Plot, Table, Curses, Logging, Scraping, Web, Profile,
NumPy, Image, Audio, Pygame.
From
====== Illustrated Guide to Python 3 ======
2020-01-19
Illustrated Guide to Python 3: A Complete Walkthrough of Beginning Python with Unique Illustrations Showing how Python Really Works. Now covering Python 3.6 (Treading on Python) (Volume 1) 2nd Edition
From
====== retrieve all groups for a specific domain ======
2019-09--17
Retrieve all groups for a domain or the account
To retrieve all groups for a specific domain or the account, use the following GET request and include the authorization described in Authorize requests. For the query strings, request, and response properties, see the API Reference. For readability, this example uses line returns:
GET https://www.googleapis.com/admin/directory/v1/groups?domain=domain name
&customer=my_customer or customerId&pageToken=pagination token
&maxResults=max results
When retrieving:
• All groups for a sub-domain — Use the domain argument with the domain's name.
• All groups for the account — Use the customer argument with either my_customer or the account's customerIdvalue. As an account administrator, use the string my_customer to represent your account's customerId. If you are a reseller accessing a resold customer's account, use the resold account's customerId. For the customerIdvalue use the account's primary domain name in the Retrieve all users in a domain operation's request. The resulting response has the customerId value.
• Using both domain and customer arguments — The API returns all the groups for the domain.
• Not using the domain and customer arguments — The API returns all the groups for the account associated with my_customer. This is the account customerId of the administrator making the API request.
From
====== APIs & Services ======
APIs & Services
Google API Dashboard
From
API s Explorer
Learn more about using the Groups Settings API by reading the documentation.
From
Google API Client
This is the Python client library for Google's discovery based APIs. To get started, please see the docs folder.
These client libraries are officially supported by Google. However, the libraries are considered complete and are in maintenance mode. This means that we will address critical bugs and security issues but will not add any new features.
Installation
To install, simply use pip or easy_install:
pip install --upgrade google-api-python-client
From
Groups Settings API
Lets you manage permission levels and related settings of a group.
Documentation for the Groups Settings API in PyDoc.
samples/groupssettings Sample for the Groups Settings API
From
====== Algorithms ======
2019-08-28
Algorithms
by Jeff Erickson
🔥1st edition, June 2019 🔥
(Amazon links: US, UK, DE, ES, FR, IT, JP)
This web page contains a free electronic version of my self-published textbook Algorithms, along with other lecture notes I have written for various theoretical computer science classes at the University of Illinois, Urbana-Champaign since 1998.
From
====== You can still miss attachments ======
2019-08-23
You can still miss attachments by following @Ilya V. Schurov or @Cam T answers, the reason is because the email structure can be different based on the mimeType.
From
Gmail API: where to find body of email depending of mimeType
From
• Now with this service you can read your emails and read any attachments you may have in your e-mails
• First you can query your e-mails with a search string to find the e-mail ids you need that have the attachments:
search_query = "ABCD"
result = service.users().messages().list(userId='me', q=search_query).execute()
msgs = results['messages')
msg_ids = [msg['id'] for msg in msgs]
• now for each messageId you can find the associated attachments in the email.
From
payload.headers[] list List of headers on this message part. For the top-level message part, representing the entire message payload, it will contain the standard RFC 2822 email headers such as To, From, and Subject.
From
headers=messageheader["payload"]["headers"]
subject= [i['value'] for i in headers if i["name"]=="Subject"]
From
====== the Gmail API ======
the Gmail API
Complete the steps described in the rest of this page to create a simple Python command-line application that makes requests to the Gmail API.
From
Download Attachments from gmail using Gmail API
Remove all special characters, punctuation and spaces from string
Example 3
import re re.sub('\W+','', string)
• string1 - Result: 3.11899876595
• string2 - Result: 2.78014397621
From
====== Access Dates ======
Access Dates
and then access the data using a loop:
for msg in msgs['messages']:
m_id = msg['id'] # get id of individual message
message = service.users().messages().get(userId='me', id=m_id).execute()
payload = message['payload']
header = payload['headers']
for item in header:
if item['name'] == 'Date':
date = item['value']
** DATA STORAGE FUNCTIONS ETC **
From
Python's strftime directives
Note: Examples are based on datetime.datetime(2013, 9, 30, 7, 6, 5)
From
====== Sort a Dictionary ======
2019-08-22
Python : How to Sort a Dictionary by key or Value ?
From
====== Plotly Cufflinks ======
2019-08-15
Interactive Plots with Plotly and Cufflinks on Pandas Dataframes
A simple and easy introduction to interactive visualisation with Plotly in python.
Ozan
Oct 8, 2018 · 4 min read
Pandas is one of the the most preferred and widely used tools in Python for data analysis. It also has it’s own sample build-in plot function. Hovewer when it comes to interactive visualization, Python users face some difficulties if they haven’t front-end engineer skills since lots of library such as D3, chart.js requires some javascript background. This is where that Plotly and Cufflinks come handy.
From
====== Parsing text with Python ======
2019-07-11
Parsing text with Python
2018-01-07 · 2966 words · 14 minute read
python
programming · parsing · python
I hate parsing files, but it is something that I have had to do at the start of nearly every project. Parsing is not easy, and it can be a stumbling block for beginners. However, once you become comfortable with parsing files, you never have to worry about that part of the problem.
From
====== DataFrames ======
2019-08-14
Pandas Tutorial: DataFrames in Python
Explore data analysis with Python. Pandas DataFrames make manipulating your data easy, from selecting or replacing columns and indices to reshaping your data.
From
====== Backblaze ======
2019-7-11
B2 python SDK
Backblaze
This repository contains a client library and a few handy utilities for easy access to all of the capabilities of B2 Cloud Storage.
B2 command-line tool is an example of how it can be used to provide command-line access to the B2 service, but there are many possible applications (including FUSE filesystems, storage backend drivers for backup applications etc).
From
Backblaze is making two new APIs available that integrators and customers have been asking for: copy_file and copy_part. Together, the new functionality makes it easier to work with large files and to copy and manipulate files directly in B2.
From
====== Plotly ======
2019-07-01
import plotly.plotly as py
import plotly.graph_objs as go
data = [
go.Scatter(
x=[1, 2],
y=[1, 2]
)
]
layout = go.Layout(
xaxis=dict(
autorange='reversed'
)
)
fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='axes-reversed')
From
====== Regular Expression ======
2019-05-29
ValidIpAddressRegex = "^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$";
From
\b(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.
(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.
(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.
(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\b
From
More efficient than re.findall() is re.finditer(regex, subject). It returns an iterator that enables you to loop over the regex matches in the subject string: for m in re.finditer(regex, subject). The for-loop variable m is a Match object with the details of the current match.
From
RegexMagic: Regular Expression Generator
From
====== Split the string at the last occurrence of sep ======
2019-05-23
str.rpartition(sep)
Split the string at the last occurrence of sep, and return a 3-tuple containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return a 3-tuple containing two empty strings, followed by the string itself.
From
====== The built-in os module has a number of useful functions ======
The built-in os module has a number of useful functions that can be used to list directory contents and filter the results. To get a list of all the files and folders in a particular directory in the filesystem, use os.listdir() in legacy versions of Python or os.scandir() in Python 3.x. os.scandir() is the preferred method to use if you also want to get file and directory properties such as file size and modification date.
From
====== Splitting, Concatenating, and Joining Strings in Python ======
2019-05-20
Splitting, Concatenating, and Joining Strings in Python
From
====== Regex Testor ======
Regex Testor
https://regex101.com/
====== processdokuwikifile ======
2019-05-15
def processdokuwikifile(in_file,par_out_file):
"""lkjshflkjlsk"""
#with open('C:\\Users\\An\\Desktop\\GoTo\\Listing2018-10-25-01-31-.txt','w') as outlog:
with open(par_out_file,'w',encoding='utf-8') as out_file:
out = csv.writer(out_file)
#with open('C:\\Users\\An\\Desktop\\GoTo\\Search2018-10-25-01-31-.txt','r') as log:
#with open('C:\\Users\\An\\Desktop\\GoTo\\Search2018-10-25-01-31-.txt','r') as log:
with open(in_file,'r',encoding='utf-8') as infile:
2019-05-07
For reference, the slide deck that I use to present on this topic is available here. All of the code and the sample text that I use is available in my Github repo here.
• Why parse files?
• The big picture
• Parsing text in standard format
• Parsing text using string methods
• Parsing text in complex format using regular expressions
• Step 1: Understand the input format
• Step 2: Import the required packages
• Step 3: Define regular expressions
• Step 4: Write a line parser
• Step 5: Write a file parser
• Step 6: Test the parser
• Is this the best solution?
• Conclusion
From
====== PASS BY OBJECT REFERENCE (Case in python): ======
2019-04-08
PASS BY OBJECT REFERENCE (Case in python):
Here, "Object references are passed by value."
def append_one(li):
li.append(1)
x = [0]
append_one(x)
print x
Here, the statement x = [0] makes a variable x (box) that points towards the object [0]
On the function being called, a new box li is created. The contents of li is the SAME as the contents of box x. Both the boxes contain the same object. That is, both the variables point to the same object in memory. Hence, any change to the object pointed at by li will also be reflected by the object pointed at by x.
In conclusion, the output of the above program will be:
[0, 1]
Note:
If the variable li is reassigned in the function, then li will point to a seperate object in memory. x however, will continue pointing to the same object in memory it was pointing to earlier.
Example:
def append_one(li):
li = [0, 1]
x = [0]
append_one(x)
print x
The output of the program will be:
[0]
From
====== Plotly ======
2019-03-31
Plotly
1962_2006_walmart_store_openings.csv Update 1962_2006_walmart_store_openings.csv 4 years ago
2010_alcohol_consumption_by_country.csv Create 2010_alcohol_consumption_by_country.csv 3 years ago
2011_february_aa_flight_paths.csv Create 2011_february_aa_flight_paths.csv 4 years ago
2011_february_us_airport_traffic.csv Create 2011_february_us_airport_traffic.csv 4 years ago
From
====== Python write to CSV ======
2019-03-29
Python write to CSV
import csv
with open(..., 'wb') as myfile:
wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
wr.writerow(mylist)
From
with open(iniFile.absolute(), 'w', newline='') as iniSettings:
#spamwriter = csv.writer(iniSettings, delimiter=',',
# quotechar='"', quoting=csv.QUOTE_MINIMAL)
#spamwriter = csv.writer(iniSettings, delimiter=',',
# quotechar='"')
spamwriter = csv.writer(iniSettings)
#spamwriter.writerow(folder_list)
folder_list.insert(0,rotation_list)
for val in folder_list:
spamwriter.writerow(val)
you can also use wr.writerows(list) – tovmeod Dec 25 '11 at 22:29
• 4
Writerows seems to break up each element in the list into columns if each element is a list as well. This is pretty handy for outputting tables. – whatnick Oct 7 '14 at 5:22
From
====== CSV in Python adding an extra carriage return, on Windows ======
CSV in Python adding an extra carriage return, on Windows
One of the possible fixes in Python3, as described in @YiboYang's answer, is opening the file with the newline parameter set to be an empty string:
f = open(path_to_file, 'w', newline='')
writer = csv.writer(f)
From
====== Examples of simple type checking in Python: ======
2019-02-17
Examples of simple type checking in Python:
assert type(variable_name) == int
assert type(variable_name) == bool
assert type(variable_name) == list
From
Use type
>>> type(one)
You can use the __name__ attribute to get the name of the object. (This is one of the few special attributes that you need to use the __dunder__ name to get to - there's not even a method for it in the inspect module.)
>>> type(one).__name__
'int'
From
====== isinstance() ======
With one argument, return the type of an object. The return value is a type object. The isinstance() built-in function is recommended for testing
From
• Syntax:
isinstance(object, classinfo)
The isinstance() takes two parameters:
object : object to be checked
classinfo : class, type, or tuple of classes and types
From
====== graph-cli ======
2019-01-05
graph-cli
A CLI utility to create graphs from CSV files.
graph-cli is designed to be highly configurable for easy and detailed graph generation. It has many flags to acquire this detail and uses reasonable defaults to avoid bothering the user. It also leverages chaining, so you can create complex graphs from multiple CSV files.
From
====== copy2 ======
2018-12-25
copy2
As with the previous methods, copy2 method is identical to the copy method, but in addition to copying the file contents it also attempts to preserve all the source file's metadata. If the platform doesn't allow for full metadata saving, then copy2 doesn't return failure and it will just preserve any metadata it can.
The syntax is as follows:
shutil.copy2(src_file, dest_file, *, follow_symlinks=True)
From
====== Start of String Only: \A ======
Start of String Only: \A
The \A anchor specifies that a match must occur at the beginning of the input string. It is identical to the ^ anchor, except that \A ignores the RegexOptions.Multiline option. Therefore, it can only match the start of the first line in a multiline input string.
From
====== Decimals interact well with much of the rest of Python ======
decimal — Decimal fixed point and floating point arithmetic
From
Unlike hardware based binary floating point, the decimal module has a user alterable precision (defaulting to 28 places) which can be as large as needed for a given problem:
>>> from decimal import *
>>> getcontext().prec = 6
>>> Decimal(1) / Decimal(7)
Decimal('0.142857')
>>> getcontext().prec = 28
>>> Decimal(1) / Decimal(7)
Decimal('0.1428571428571428571428571429')
From
Decimals interact well with much of the rest of Python. Here is a small decimal floating point flying circus:
>>> data = list(map(Decimal, '1.34 1.87 3.45 2.35 1.00 0.03 9.25'.split()))
>>> max(data)
Decimal('9.25')
>>> min(data)
Decimal('0.03')
>>> sorted(data)
[Decimal('0.03'), Decimal('1.00'), Decimal('1.34'), Decimal('1.87'),
Decimal('2.35'), Decimal('3.45'), Decimal('9.25')]
From
====== splitting a number into the integer and decimal parts ======
splitting a number into the integer and decimal parts
>>> a = 147.234
>>> a % 1
0.23400000000000887
>>> a // 1
147.0
>>>
If you want the integer part as an integer and not a float, use int(a//1) instead. To obtain the tuple in a single passage: (int(a//1), a%1)
EDIT: Remember that the decimal part of a float number is approximate, so if you want to represent it as a human would do, you need to use the decimal library
From
import math
x = 1234.5678
math.modf(x) # (0.5678000000000338, 1234.0)
From
Create a date object:
import datetime
x = datetime.datetime(2020, 5, 17)
From
====== Module datetime provides ======
Module datetime provides classes for manipulating date and time in more object oriented way. One of them is datetime.datetime.now which return number of seconds since the epoch.
import datetime;
ts = datetime.datetime.now().timestamp()
print(ts)
# 1545665588.52
From
x = int(datetime.datetime(2070, 12, 13, 1, 48, 35).timestamp() - datetime.datetime.now().timestamp()//1)
print(x)
====== Example 2: Right justify string and fill the remaining spaces ======
Example 2: Right justify string and fill the remaining spaces
# example string
string = 'cat'
width = 5
fillchar = '*'
# print right justified string
print(string.rjust(width, fillchar))
From
====== Practical Business Python ======
2018-12-24
Practical Business Python
pbpython/extras/Pathlib-Cheatsheet.pdf
From
====== The divmod() returns ======
2018-12-23
The divmod() returns
• (q, r) - a pair of numbers (a tuple) consisting of quotient q and remainder r
From
====== numpy ======
2018-11-18
pip install numpy
From
pip3.6 install numpy
pip3.6 install scipy
pip3.6 install matplotlib
pip3.6 install opencv
Install opencv-python instead of cv2.
pip install opencv-python
From
====== compare the use of lambda ======
We can compare the use of lambda with that of def to create a function.
adder_lambda = lambda parameter1,parameter2: parameter1+parameter2
def adder_regular(parameter1, parameter2): return parameter1+parameter2
From
====== Key Functions ======
Key Functions
Both list.sort() and sorted() have a key parameter to specify a function to be called on each list element prior to making comparisons.
For example, here’s a case-insensitive string comparison:
>>> sorted("This is a test string from Andrew".split(), key=str.lower)
['a', 'Andrew', 'from', 'is', 'string', 'test', 'This']
The value of the key parameter should be a function that takes a single argument and returns a key to use for sorting purposes. This technique is fast because the key function is called exactly once for each input record.
From
This image was created with the following code.
1 import operator
2 import pylab
3 from easydev import Timer
4
5 times1, times2, times3, times4 = [], [], [], []
6 pylab.clf()
7 d = {"Pierre": 42, "Anne": 33, "Zoe": 24}
8 for j in range(20):
9 N = 1000000
10 with Timer(times3):
11 for i in range(N):
12 sorted_d = sorted((key, value) for (key,value) in d.items())
13 with Timer(times2):
14 for i in range(N):
15 sorted_d = sorted(d.items(), key=lambda x: x[1])
16 with Timer(times1):
17 for i in range(N):
18 sorted_d = sorted(d.items(), key=operator.itemgetter(1))
19 with Timer(times4):
20 for i in range(N):
21 sorted_d = [(k,v) for k,v in d.items()]
22 print(j)
23 pylab.boxplot([times1, times2, times3, times4])
24 pylab.xticks([1,2,3,4], ["operator", "lambda", "list comprehension and lambda", "py36"])
25 pylab.ylabel("Time (seconds) 1 million sorting \n (repeated 20 times)")
26 pylab.grid()
27 pylab.title("Performance sorted dictionary by values")
From
As already said, iteritems() will be a problem, but you mention a syntax error, which comes from the lambda declaration with parenthesis:
Change:
key=lambda(k, v): sort_order.index(k)
To:
key=lambda k, v: sort_order.index(k)
From
====== What problem does pandas solve? ======
2018-11-15
What problem does pandas solve?
Python has long been great for data munging and preparation, but less so for data analysis and modeling. pandas helps fill this gap, enabling you to carry out your entire data analysis workflow in Python without having to switch to a more domain specific language like R.
From
NumPy
NumPy is the fundamental package for scientific computing with Python. It contains among other things:
• a powerful N-dimensional array object
• sophisticated (broadcasting) functions
• tools for integrating C/C++ and Fortran code
• useful linear algebra, Fourier transform, and random number capabilities
From
Array Broadcasting
Broadcasting is the name given to the method that NumPy uses to allow array arithmetic between arrays with a different shape or size.
From
====== scikit-learn ======
scikit-learn
Machine Learning in Python
• Simple and efficient tools for data mining and data analysis
• Accessible to everybody, and reusable in various contexts
• Built on NumPy, SciPy, and matplotlib
• Open source, commercially usable - BSD license
From
Welcome to PyBrain
PyBrain is a modular Machine Learning Library for Python. Its goal is to offer flexible, easy-to-use yet still powerful algorithms for Machine Learning Tasks and a variety of predefined environments to test and compare your algorithms.
From
====== python read fails on special characters ======
2018-11-06
python read fails on special characters
with io.open(fileToSearch,'r',encoding='utf-8') as file:
From
An unrelated hint: have a look at the built-in function enumerate, which frees you from taking care of incrementing counter: You simply write for counter, line in enumerate(file):
From
====== idle args ======
2018-10-31
A number of IDEs support menu options to set the execution environment for programs under development and testing. In particular, it would be nice if IDLE let the user set command line arguments to be passed into sys.argv when running a script by pressing F5.
Here are some existing implementations for reference:
* Wing-IDE: https://wingware.com/doc/intro/tutorial-debugging-launch
* Visual Studio: https://www.youtube.com/watch?v=IgbQCRHKV-Y
* PyCharm: https://www.jetbrains.com/pycharm/help/run-debug-configuration-python.html
This feature will help users interactively develop and test command-line tools while retaining all the nice features of the IDE. I would personally find it useful when teaching students about how sys.argv works.
From
Pending application of a patch, the following will work to only add args to sys.argv when running from an Idle editor.
import sys
# ...
if __name__ == '__main__':
if 'idlelib.PyShell' in sys.modules:
sys.argv.extend(('a', '-2')) # add your argments here.
print(sys.argv) # in use, parse sys.argv after extending it
# ['C:\\Programs\\python34\\tem.py', 'a', '-2']
From
try:
__file__
except:
sys.argv = [sys.argv[0], 'argument1', 'argument2', 'argument2']
From
====== Auto detect IDLE and prompt for command-line argument values ======
2018-10-31
Auto detect IDLE and prompt for command-line argument values
c#! /usr/bin/env python3
import sys
def ok(x=None):
sys.argv.extend(e.get().split())
root.destroy()
if 'idlelib.rpc' in sys.modules:
import tkinter as tk
root = tk.Tk()
tk.Label(root, text="Command-line Arguments:").pack()
e = tk.Entry(root)
e.pack(padx=5)
tk.Button(root, text="OK", command=ok,
default=tk.ACTIVE).pack(pady=5)
root.bind("", ok)
root.bind("", lambda x: root.destroy())
e.focus()
From
====== print the files deleted ======
2018-10-30
Python Script
Here's a Python script that will also print the files deleted
import os
for line in open("./data/deleted.files"):
if line.isspace() or line[0] == '#':
continue
line = line.rstrip(os.linesep)
try:
if os.path.exists(line):
print('File removed => ' + line)
os.remove(line)
except OSError:
pass
====== delete directories ======
Here's an alternative Python script that is case sensitive and will also delete directories included in the list
import os
import shutil
def exists_casesensitive(path):
if not os.path.exists(path):
return False
directory, filename = os.path.split(path)
return filename in os.listdir(directory)
with open("./data/deleted.files") as file:
for line in file:
line = line.strip()
if line and not line.startswith('#'):
path = line.rstrip(os.linesep)
if exists_casesensitive(path):
if os.path.isdir(path):
shutil.rmtree(path)
print('Directory removed => ' + path)
else:
os.remove(path)
print('File removed => ' + path)
else:
#print('File not found => ' + path)
pass
From
====== checkpoints ======
2018-10-06
GLOB
def delete_previous_checkpoints(self, num_previous=5):
"""
Deletes all previous checkpoints that are before the present checkpoint.
This is done to prevent blowing out of memory due to too many checkpoints
:param num_previous:
:return:
"""
self.present_checkpoints = glob.glob(self.get_checkpoint_location() + '/*.ckpt')
if len(self.present_checkpoints) > num_previous:
present_ids = [self.__get_id(ckpt) for ckpt in self.present_checkpoints]
present_ids.sort()
ids_2_delete = present_ids[0:len(present_ids) - num_previous]
for ckpt_id in ids_2_delete:
ckpt_file_nm = self.get_checkpoint_location() + '/model_' + str(ckpt_id) + '.ckpt'
os.remove(ckpt_file_nm)
From
====== argparse ======
2018-10-06
If you're doing anything more complicated than a script that accepts a few required positional arguments, you'll want to use a parser. Depending on your python version, there are 3 available in the python standard library (getopt, optparse and argparse) and argparse is by far the best.
From
====== Argparse Tutorial ======
Argparse Tutorial
author: Tshepang Lekhonkhobe
This tutorial is intended to be a gentle introduction to argparse, the recommended command-line parsing module in the Python standard library.
From
*args and **kwargs in Python
*args
The special syntax *args in function definitions in python is used to pass a variable number of arguments to a function. It is used to pass a non-keyworded, variable-length argument list.
From
====== recursive ======
2018-10-03
In Python 3.5 and newer use the new recursive **/ functionality:
configfiles = glob.glob('C:/Users/sam/Desktop/file1/**/*.txt', recursive=True)
When recursive is set, ** followed by a path separator matches 0 or more subdirectories.
From
I have successfully used
for i in d.rglob('**/*'):
for i in d.iglob('**/*'):
The “**” pattern means “this directory and all subdirectories, recursively”. In other words, it enables recursive globbing:
From
errno.ENOTEMPTY
Directory not empty
From
errno.EACCES¶
Permission denied
From
except OSError as e:
if e.errno not in _IGNORED_ERROS:
raise
return False
From
except OSError as e:
if e.errno != EINVAL and strict:
raise
From
====== walktree ======
2018-10-01
import os, sys
from stat import *
def walktree(top, callback):
'''recursively descend the directory tree rooted at top,
calling the callback function for each regular file'''
for f in os.listdir(top):
pathname = os.path.join(top, f)
mode = os.stat(pathname).st_mode
if S_ISDIR(mode):
# It's a directory, recurse into it
walktree(pathname, callback)
elif S_ISREG(mode):
# It's a file, call the callback function
callback(pathname)
else:
# Unknown file type, print a message
print('Skipping %s' % pathname)
def visitfile(file):
print('visiting', file)
if __name__ == '__main__':
walktree(sys.argv[1], visitfile)
From
====== Dropbox in python ======
2018-09-29
Dropbox in python
from pathlib import Path
import arrow
filesPath = r"C:\scratch\removeThem"
criticalTime = arrow.now().shift(hours=+5).shift(days=-7)
for item in Path(filesPath).glob('*'):
if item.is_file():
print (str(item.absolute()))
itemTime = arrow.get(item.stat().st_mtime)
if itemTime < criticalTime:
#remove it
pass
From
In IDLE, go to Options -> Configure IDLE -> Keys and there select history-next and then history-previous to change the keys.
Then click on Get New Keys for Selection and you are ready to choose whatever key combination you want.
From
====== CSV Toolkit Overview ======
2018-09-04
CSV Toolkit Overview
NOTE: THIS PROJECT HAS SINCE BEEN FORKED TO THE INTERNAL PROMETHEUS RESEACH, LLC TOOL PROPS.CSVTOOLKIT
CSV Toolkit is a Python package that provides validation tooling and processing of CSV files. The validation tooling is based on the fantastic package Vladiate. The interface and extension mechanisms are similarly implemented as the rex.core extension mechanisms.
From
====== What is Bonobo? ======
What is Bonobo?
Bonobo is a lightweight Extract-Transform-Load (ETL) framework for Python 3.5+.
It provides tools for building data transformation pipelines, using plain python primitives, and executing them in parallel.
Bonobo is the swiss army knife for everyday's data.
From
csvkit is a suite of command-line tools for converting to and working with CSV, the king of tabular file formats.
From
====== Awesome Python ======
Awesome Python
A curated list of awesome Python frameworks, libraries, software and resources.
Inspired by awesome-php.
• Awesome Python
○ Admin Panels
○ Algorithms and Design Patterns
○ Anti-spam
○ Asset Management
○ Audio
○ Authentication
○ Build Tools
○ AND MANY MORE
From
====== Python data visualization: Comparing 7 tools ======
2018-09-04
Python data visualization: Comparing 7 tools
The Python scientific stack is fairly mature, and there are libraries for a variety of use cases, including machine learning, and data analysis. Data visualization is an important part of being able to explore data and communicate results, but has lagged a bit behind other tools such as R in the past.
From
====== Best way to sort txt file using csv tools in python ======
2018-09-04
Best way to sort txt file using csv tools in python
From
import csv
import operator
#==========Search by ID number. Return Just the Name Fields for the Student
with open("studentinfo.txt","r") as f:
studentfileReader=csv.reader(f)
id=input("Enter Id:")
for row in studentfileReader:
for field in row:
if field==id:
currentindex=row.index(id)
print(row[currentindex+1]+" "+row[currentindex+2])
#=========Sort by Last Name
with open("studentinfo.txt","r") as f:
studentfileReader=csv.reader(f)
sortedlist=sorted(f,key=operator.itemgetter(0),reverse=True)
print(sortedlist)
From
2018-08-10
The sys.path list contains the list of directories which will be searched for modules at runtime:
python -v
>>> import sys
>>> sys.path
['', '/usr/local/lib/python25.zip', '/usr/local/lib/python2.5', ... ]
From
For speedtest - /usr/local/lib