Python
How to efficiently parse fixed width files?
Python 3 always stores text strings as sequences of Unicode code points.
UCS-2 is UTF-16
Atrribute errno
Cclasses Make an empty file called __init__.py
Python RegEx
Discovering millions of datasets
Python Cheatsheet
Illustrated Guide to Python 3
retrieve all groups for a specific domain
APIs & Services
Algorithms
You can still miss attachments
the Gmail API
Access Dates
Sort a Dictionary
Plotly Cufflinks
Parsing text with Python
DataFrames
Backblaze
Plotly
Regular Expression
Split the string at the last occurrence of sep
The built-in os module has a number of useful functions
Splitting, Concatenating, and Joining Strings in Python
Regex Testor
processdokuwikifile
PASS BY OBJECT REFERENCE (Case in python):
Plotly
Python write to CSV
CSV in Python adding an extra carriage return, on Windows
Examples of simple type checking in Python:
isinstance()
graph-cli
copy2
Start of String Only: \A
Decimals interact well with much of the rest of Python
splitting a number into the integer and decimal parts
Module datetime provides
Example 2: Right justify string and fill the remaining spaces
Practical Business Python
The divmod() returns
numpy
compare the use of lambda
Key Functions
What problem does pandas solve?
scikit-learn
python read fails on special characters
idle args
Auto detect IDLE and prompt for command-line argument values
print the files deleted
delete directories
checkpoints
argparse
Argparse Tutorial
recursive
walktree
Dropbox in python
CSV Toolkit Overview
What is Bonobo?
Awesome Python
Python data visualization: Comparing 7 tools
Best way to sort txt file using csv tools in python

Python

————————————————————————————————————————————————

2020-12-26

Wireshark tcpdump to neo4j plot

5.7.2. The “Export Packet Dissections” Dialog Box This lets you save the packet list, packet details, and packet bytes as plain text, CSV, JSON, and other formats.

From <https://www.wireshark.org/docs/wsug_html_chunked/ChIOExportSection.html>

tshark -T json -r file.pcap
tshark -T json -j "http tcp ip" -x -r file.pcap

From <https://ask.wireshark.org/question/12850/command-line-tshark-json-and-packet-details-all-expanded/>

TShark is a network protocol analyzer. It lets you capture packet data from a live network, or read packets from a previously saved capture file, either printing a decoded form of those packets to the standard output or writing the packets to a file. TShark's native capture file format is pcapng format, which is also the format used by wireshark and various other tools.

From <https://www.wireshark.org/docs/man-pages/tshark.html>

  tshark.exe" -T json -j "http tcp ip" -r "\\SERVER\Db\Mc\br0-2020-07-16-17-40.txt" > "\\SERVER\Db\Mc\test.txt"

How to efficiently parse fixed width files?

2020-08-15

Here's a way to do it with string slices, as you were considering but were concerned that it might get too ugly. The nice thing about it is, besides not being all that ugly, is that it works unchanged in both Python 2 and 3, as well as being able to handle Unicode strings. Speed-wise it is, of course, slower than the versions based the struct module, but could be sped-up slightly by removing the ability to have padding fields.

try:
    from itertools import izip_longest  # added in Py 2.6
except ImportError:
    from itertools import zip_longest as izip_longest  # name change in Py 3.x
try:
    from itertools import accumulate  # added in Py 3.2
except ImportError:
    def accumulate(iterable):
        'Return running totals (simplified version).'
        total = next(iterable)
        yield total
        for value in iterable:
            total += value
            yield total
def make_parser(fieldwidths):
    cuts = tuple(cut for cut in accumulate(abs(fw) for fw in fieldwidths))
    pads = tuple(fw < 0 for fw in fieldwidths) # bool values for padding fields
    flds = tuple(izip_longest(pads, (0,)+cuts, cuts))[:-1]  # ignore final one
    parse = lambda line: tuple(line[i:j] for pad, i, j in flds if not pad)
    # optional informational function attributes
    parse.size = sum(abs(fw) for fw in fieldwidths)
    parse.fmtstring = ' '.join('{}{}'.format(abs(fw), 'x' if fw < 0 else 's')
                                                for fw in fieldwidths)
    return parse
line = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789\n'
fieldwidths = (2, -10, 24)  # negative widths represent ignored padding fields
parse = make_parser(fieldwidths)
fields = parse(line)
print('format: {!r}, rec size: {} chars'.format(parse.fmtstring, parse.size))
print('fields: {}'.format(fields))
Output:
format: '2s 10x 24s', rec size: 36 chars
fields: ('AB', 'MNOPQRSTUVWXYZ0123456789')

From: <https://stackoverflow.com/questions/4914008/how-to-efficiently-parse-fixed-width-files>

Python 3 always stores text strings as sequences of Unicode code points.

2020-08-08

Python 3 always stores text strings as sequences of Unicode code points. These are values in the range 0-0x10FFFF. They don’t always correspond directly to the characters you read on your screen, but that distinction doesn’t matter for most text manipulation tasks.

From <http://python-notes.curiousefficiency.org/en/latest/python3/text_file_processing.html>

UCS-2 is UTF-16

2020-08-08

UCS-2 is UTF-16, really, for any codepoint that was assigned when it was still called UCS-2 in any case. Open it with encoding='utf16'. If there is no BOM (the Byte order mark, 2 bytes at the start, for BE that'd be \xfe\xff), then use encoding='utf_16_be' to force a byte order.

From <https://stackoverflow.com/questions/14488346/python-3-reading-ucs-2-be-file>

There is a useful package in Python - chardet, which helps to detect the encoding used in your file. Actually there is no program that can say with 100% confidence which encoding was used - that's why chardet gives the encoding with the highest probability the file was encoded with. Chardet can detect following encodings:

• ASCII, UTF-8, UTF-16 (2 variants), UTF-32 (4 variants)
• Big5, GB2312, EUC-TW, HZ-GB-2312, ISO-2022-CN (Traditional and Simplified Chinese)
• EUC-JP, SHIFT_JIS, CP932, ISO-2022-JP (Japanese)
• EUC-KR, ISO-2022-KR (Korean)
• KOI8-R, MacCyrillic, IBM855, IBM866, ISO-8859-5, windows-1251 (Cyrillic)
• ISO-8859-2, windows-1250 (Hungarian)
• ISO-8859-5, windows-1251 (Bulgarian)
• windows-1252 (English)
• ISO-8859-7, windows-1253 (Greek)
• ISO-8859-8, windows-1255 (Visual and Logical Hebrew)
• TIS-620 (Thai)

From <https://riptutorial.com/encoding/example/23227/how-to-detect-the-encoding-of-a-text-file-with-python→

You can install chardet with a pip command: pip install chardet mport chardet rawdata = open(file, “rb”).read() result = chardet.detect(rawdata) charenc = result['encoding']

From <https://riptutorial.com/encoding/example/23227/how-to-detect-the-encoding-of-a-text-file-with-python→

Atrribute errno

2020-07-16

Atrribute errno is defined only in OSError and classes inheriting from it. So apparently line 88 is part of try…except clause and in that line you're trying to use e.errno. You can't do that if the exception doesn't belong to OSError exceptions family.

From <https://stackoverflow.com/questions/48541077/exceptions-runtimeerror-object-has-no-attribute-errno>

Cclasses Make an empty file called init.py

2020-07-05

Cclasses Make an empty file called init.py in the same directory as the files. That will signify to Python that it's “ok to import from this directory”.

From <https://stackoverflow.com/questions/4142151/how-to-import-the-class-within-the-same-directory-or-sub-directory>

Same as previous, but prefix the module name with a . if not using a subdirectory: from .user import User from .dir import Dir

From <https://stackoverflow.com/questions/4142151/how-to-import-the-class-within-the-same-directory-or-sub-directory>

Python 3.3+ has Implicit Namespace Packages that allow it to create a packages without an init.py file.

Allowing implicit namespace packages means that the requirement to provide an __init__.py file can be dropped completely, and affected .

From <https://stackoverflow.com/questions/37139786/is-init-py-not-required-for-packages-in-python-3-3>

PEP 420 – Implicit Namespace Packages

From <https://www.python.org/dev/peps/pep-0420/>

Python RegEx

2020-07-04

Python RegEx ❮ PreviousNext ❯

A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern. RegEx can be used to check if a string contains the specified search pattern.

RegEx Module Python has a built-in package called re, which can be used to work with Regular Expressions. Import the re module:

From <https://www.w3schools.com/python/python_regex.asp>

Discovering millions of datasets

2020-04-02

RAW DATA https://datasetsearch.research.google.com/ Discovering millions of datasets on the web Natasha Noy Research Scientist, Google Research Published Jan 23, 2020

					Across the web, there are millions of datasets about nearly any subject that interests you. If you’re looking to buy a puppy, you could find datasets compiling complaints of puppy buyers or studies on puppy cognition. Or if you like skiing, you could find data on revenue of ski resorts or injury rates and participation numbers. Dataset Search has indexed almost 25 million of these datasets, giving you a single place to search for datasets and find links to where the data is. Over the past year, people have tried it out and provided feedback, and now Dataset Search is officially out of beta.

From <https://blog.google/products/search/discovering-millions-datasets-web/?utm_source=hackernewsletter&utm_medium=email&utm_term=data>

Python Cheatsheet

2020-04-01

Comprehensive Python Cheatsheet

Contents 1. Collections: List, Dictionary, Set, Tuple, Range, Enumerate, Iterator, Generator. 2. Types: Type, String, Regular_Exp, Format, Numbers, Combinatorics, Datetime. 3. Syntax: Args, Inline, Closure, Decorator, Class, Duck_Type, Enum, Exception. 4. System: Exit, Print, Input, Command_Line_Arguments, Open, Path, OS_Commands. 5. Data: JSON, Pickle, CSV, SQLite, Bytes, Struct, Array, Memory_View, Deque. 6. Advanced: Threading, Operator, Introspection, Metaprograming, Eval, Coroutines. 7. Libraries: Progress_Bar, Plot, Table, Curses, Logging, Scraping, Web, Profile, NumPy, Image, Audio, Pygame.

From <https://github.com/gto76/python-cheatsheet?utm_source=hackernewsletter&utm_medium=email&utm_term=code>

Illustrated Guide to Python 3

2020-01-19

Illustrated Guide to Python 3: A Complete Walkthrough of Beginning Python with Unique Illustrations Showing how Python Really Works. Now covering Python 3.6 (Treading on Python) (Volume 1) 2nd Edition

From <https://www.amazon.com/Illustrated-Guide-Python-Walkthrough-Illustrations/dp/1977921752?SubscriptionId=AKIAIGH7TZJVBZLN4QSQ&tag=peterbecom-20&linkCode=xm2&camp=2025&creative=165953&creativeASIN=1977921752>

retrieve all groups for a specific domain

2019-09–17

Retrieve all groups for a domain or the account To retrieve all groups for a specific domain or the account, use the following GET request and include the authorization described in Authorize requests. For the query strings, request, and response properties, see the API Reference. For readability, this example uses line returns:

GET https://www.googleapis.com/admin/directory/v1/groups?domain=domain name
&customer=my_customer or customerId&pageToken=pagination token
&maxResults=max results

When retrieving:

 • All groups for a sub-domain — Use the domain argument with the domain's name.
 • All groups for the account — Use the customer argument with either my_customer or the account's customerIdvalue. As an account administrator, use the string my_customer to represent your account's customerId. If you are a reseller accessing a resold customer's account, use the resold account's customerId. For the customerIdvalue use the account's primary domain name in the Retrieve all users in a domain operation's request. The resulting response has the customerId value.
 • Using both domain and customer arguments — The API returns all the groups for the domain.
 • Not using the domain and customer arguments — The API returns all the groups for the account associated with my_customer. This is the account customerId of the administrator making the API request.

From <https://developers.google.com/admin-sdk/directory/v1/guides/manage-groups>

APIs & Services

Google API Dashboard

From <https://console.developers.google.com/apis/dashboard?project=quickstart-1566368013602&authuser=0&pli=1>

API s Explorer

Learn more about using the Groups Settings API by reading the documentation.

From <https://developers.google.com/apis-explorer/#p/groupssettings/v1/>

Google API Client This is the Python client library for Google's discovery based APIs. To get started, please see the docs folder. These client libraries are officially supported by Google. However, the libraries are considered complete and are in maintenance mode. This means that we will address critical bugs and security issues but will not add any new features. Installation To install, simply use pip or easy_install: pip install –upgrade google-api-python-client

From <https://github.com/googleapis/google-api-python-client>

Groups Settings API Lets you manage permission levels and related settings of a group. Documentation for the Groups Settings API in PyDoc. samples/groupssettings Sample for the Groups Settings API

From <https://github.com/googleapis/google-api-python-client/tree/master/samples>

Algorithms

2019-08-28

Algorithms by Jeff Erickson 🔥1st edition, June 2019 🔥 (Amazon links: US, UK, DE, ES, FR, IT, JP) This web page contains a free electronic version of my self-published textbook Algorithms, along with other lecture notes I have written for various theoretical computer science classes at the University of Illinois, Urbana-Champaign since 1998.

From <http://jeffe.cs.illinois.edu/teaching/algorithms/#book>

You can still miss attachments

2019-08-23

You can still miss attachments by following @Ilya V. Schurov or @Cam T answers, the reason is because the email structure can be different based on the mimeType.

From <https://stackoverflow.com/questions/25832631/download-attachments-from-gmail-using-gmail-api>

Gmail API: where to find body of email depending of mimeType

From <https://stackoverflow.com/questions/37445865/gmail-api-where-to-find-body-of-email-depending-of-mimetype#37463491>

 • Now with this service you can read your emails and read any attachments you may have in your e-mails
 • First you can query your e-mails with a search string to find the e-mail ids you need that have the attachments:
 search_query = "ABCD"
 result = service.users().messages().list(userId='me', q=search_query).execute()
 msgs = results['messages')
 msg_ids = [msg['id'] for msg in msgs]
 • now for each messageId you can find the associated attachments in the email.

From <https://stackoverflow.com/questions/41749236/download-a-csv-file-from-gmail-using-python>

 payload.headers[]	list	List of headers on this message part. For the top-level message part, representing the entire message payload, it will contain the standard RFC 2822 email headers such as To, From, and Subject.

From <https://developers.google.com/gmail/api/v1/reference/users/messages>

 headers=messageheader["payload"]["headers"]
 subject= [i['value'] for i in headers if i["name"]=="Subject"]

From <https://stackoverflow.com/questions/55144261/python-how-to-get-the-subject-of-an-email-from-gmail-api>

the Gmail API

Complete the steps described in the rest of this page to create a simple Python command-line application that makes requests to the Gmail API.

From <https://developers.google.com/gmail/api/quickstart/python#step_4_run_the_sample>

Download Attachments from gmail using Gmail API Remove all special characters, punctuation and spaces from string

Example 3

import re re.sub('\W+','', string)
	• string1 - Result: 3.11899876595
	• string2 - Result: 2.78014397621

From <https://stackoverflow.com/questions/5843518/remove-all-special-characters-punctuation-and-spaces-from-string>

Access Dates

and then access the data using a loop:

for msg in msgs['messages']:
    m_id = msg['id'] # get id of individual message
    message = service.users().messages().get(userId='me', id=m_id).execute()
    payload = message['payload'] 
    header = payload['headers']
for item in header:
        if item['name'] == 'Date':
           date = item['value']
           ** DATA STORAGE FUNCTIONS ETC **

From <https://stackoverflow.com/questions/46615395/gmail-api-quickly-access-the-dates-of-every-email-ever-sent-received>

Python's strftime directives Note: Examples are based on datetime.datetime(2013, 9, 30, 7, 6, 5)

From <http://strftime.org/>

Sort a Dictionary

2019-08-22

Python : How to Sort a Dictionary by key or Value ?

From <https://thispointer.com/python-how-to-sort-a-dictionary-by-key-or-value/>

Plotly Cufflinks

2019-08-15

Interactive Plots with Plotly and Cufflinks on Pandas Dataframes A simple and easy introduction to interactive visualisation with Plotly in python.

Ozan Oct 8, 2018 · 4 min read Pandas is one of the the most preferred and widely used tools in Python for data analysis. It also has it’s own sample build-in plot function. Hovewer when it comes to interactive visualization, Python users face some difficulties if they haven’t front-end engineer skills since lots of library such as D3, chart.js requires some javascript background. This is where that Plotly and Cufflinks come handy.

From <https://medium.com/@ozan/interactive-plots-with-plotly-and-cufflinks-on-pandas-dataframes-af6f86f62d94>

Parsing text with Python

2019-07-11

Parsing text with Python 2018-01-07 · 2966 words · 14 minute read python programming · parsing · python I hate parsing files, but it is something that I have had to do at the start of nearly every project. Parsing is not easy, and it can be a stumbling block for beginners. However, once you become comfortable with parsing files, you never have to worry about that part of the problem.

From <https://www.vipinajayakumar.com/parsing-text-with-python/>

DataFrames

2019-08-14

Pandas Tutorial: DataFrames in Python Explore data analysis with Python. Pandas DataFrames make manipulating your data easy, from selecting or replacing columns and indices to reshaping your data.

From <https://www.datacamp.com/community/tutorials/pandas-tutorial-dataframe-python>

Backblaze

2019-7-11

B2 python SDK

Backblaze This repository contains a client library and a few handy utilities for easy access to all of the capabilities of B2 Cloud Storage. B2 command-line tool is an example of how it can be used to provide command-line access to the B2 service, but there are many possible applications (including FUSE filesystems, storage backend drivers for backup applications etc).

From <https://github.com/Backblaze/b2-sdk-python/tree/master?utm_campaign=Newsletter-B2&utm_medium=email&_hsenc=p2ANqtz-933CaN9zTTwQD8oSR0sbDcpfTXIqfjs03KzOMqNB9Q9g7grroQTWRtUHD58cdprg1KrCGAIm0wSrQAVXoLWNFlxPlhug&_hsmi=74285577&utm_content=74285577&utm_source=hs_email&hsCtaTracking=1e01a8c6-0568-4a44-b739-4dbbcccda871%7Cc331f25d-0ec0-48b0-8b7d-6a15b9112aec>

Backblaze is making two new APIs available that integrators and customers have been asking for: copy_file and copy_part. Together, the new functionality makes it easier to work with large files and to copy and manipulate files directly in B2.

From <https://mail.google.com/mail/u/0/?zx=lgfx6lxl1rpm#inbox>

Plotly

2019-07-01

import plotly.plotly as py
import plotly.graph_objs as go
data = [
    go.Scatter(
        x=[1, 2],
        y=[1, 2]
    )
]
layout = go.Layout(
    xaxis=dict(
        autorange='reversed'
    )
)
fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='axes-reversed')

From <https://plot.ly/python/axes/>

Regular Expression

2019-05-29

  ValidIpAddressRegex = "^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$";

From <https://stackoverflow.com/questions/10086572/ip-address-validation-in-python-using-regex>

\b(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.
  (25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.
  (25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.
  (25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\b

From <https://www.regular-expressions.info/ip.html>

More efficient than re.findall() is re.finditer(regex, subject). It returns an iterator that enables you to loop over the regex matches in the subject string: for m in re.finditer(regex, subject). The for-loop variable m is a Match object with the details of the current match.

From <https://www.regular-expressions.info/python.html>

RegexMagic: Regular Expression Generator

From <https://www.regular-expressions.info/regexmagic.html>

Split the string at the last occurrence of sep

2019-05-23

  str.rpartition(sep)

Split the string at the last occurrence of sep, and return a 3-tuple containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return a 3-tuple containing two empty strings, followed by the string itself.

From <https://docs.python.org/3/library/stdtypes.html#string-methods>

The built-in os module has a number of useful functions

The built-in os module has a number of useful functions that can be used to list directory contents and filter the results. To get a list of all the files and folders in a particular directory in the filesystem, use os.listdir() in legacy versions of Python or os.scandir() in Python 3.x. os.scandir() is the preferred method to use if you also want to get file and directory properties such as file size and modification date.

From <https://realpython.com/working-with-files-in-python/#reading-and-writing-data-to-files-in-python>

Splitting, Concatenating, and Joining Strings in Python

2019-05-20

Splitting, Concatenating, and Joining Strings in Python

From <https://realpython.com/python-string-split-concatenate-join/>

Regex Testor

https://regex101.com/

processdokuwikifile

2019-05-15

def processdokuwikifile(in_file,par_out_file):
    """lkjshflkjlsk"""
    #with open('C:\\Users\\An\\Desktop\\GoTo\\Listing2018-10-25-01-31-.txt','w') as outlog:
    with open(par_out_file,'w',encoding='utf-8') as out_file:
        out = csv.writer(out_file)
        #with open('C:\\Users\\An\\Desktop\\GoTo\\Search2018-10-25-01-31-.txt','r') as log:
        #with open('C:\\Users\\An\\Desktop\\GoTo\\Search2018-10-25-01-31-.txt','r') as log:
        with open(in_file,'r',encoding='utf-8') as infile:

2019-05-07

For reference, the slide deck that I use to present on this topic is available here. All of the code and the sample text that I use is available in my Github repo here. • Why parse files? • The big picture • Parsing text in standard format • Parsing text using string methods • Parsing text in complex format using regular expressions • Step 1: Understand the input format • Step 2: Import the required packages • Step 3: Define regular expressions • Step 4: Write a line parser • Step 5: Write a file parser • Step 6: Test the parser • Is this the best solution? • Conclusion

From <https://www.vipinajayakumar.com/parsing-text-with-python/>

PASS BY OBJECT REFERENCE (Case in python):

2019-04-08

PASS BY OBJECT REFERENCE (Case in python): Here, “Object references are passed by value.”

def append_one(li):
    li.append(1)
x = [0]
append_one(x)
print x

Here, the statement x = [0] makes a variable x (box) that points towards the object [0] On the function being called, a new box li is created. The contents of li is the SAME as the contents of box x. Both the boxes contain the same object. That is, both the variables point to the same object in memory. Hence, any change to the object pointed at by li will also be reflected by the object pointed at by x. In conclusion, the output of the above program will be: [0, 1] Note: If the variable li is reassigned in the function, then li will point to a seperate object in memory. x however, will continue pointing to the same object in memory it was pointing to earlier. Example:

def append_one(li):
    li = [0, 1]
x = [0]
append_one(x)
print x

The output of the program will be: [0]

From <https://stackoverflow.com/questions/13299427/python-functions-call-by-reference>

Plotly

2019-03-31

Plotly

1962_2006_walmart_store_openings.csv	Update 1962_2006_walmart_store_openings.csv	4 years ago	
	2010_alcohol_consumption_by_country.csv	Create 2010_alcohol_consumption_by_country.csv	3 years ago
	2011_february_aa_flight_paths.csv	Create 2011_february_aa_flight_paths.csv	4 years ago
	2011_february_us_airport_traffic.csv	Create 2011_february_us_airport_traffic.csv	4 years ago

From <https://github.com/plotly/datasets>

Python write to CSV

2019-03-29

Python write to CSV

import csv
with open(..., 'wb') as myfile:
    wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
    wr.writerow(mylist)

From <https://stackoverflow.com/questions/2084069/create-a-csv-file-with-values-from-a-python-list>

    with open(iniFile.absolute(), 'w', newline='') as iniSettings:
        #spamwriter = csv.writer(iniSettings, delimiter=',',
        #                        quotechar='"', quoting=csv.QUOTE_MINIMAL)
        #spamwriter = csv.writer(iniSettings, delimiter=',',
        #                        quotechar='"')
        spamwriter = csv.writer(iniSettings)
        #spamwriter.writerow(folder_list)
        folder_list.insert(0,rotation_list)
        for val in folder_list:
            spamwriter.writerow(val)

you can also use wr.writerows(list) – tovmeod Dec 25 '11 at 22:29 • 4 Writerows seems to break up each element in the list into columns if each element is a list as well. This is pretty handy for outputting tables. – whatnick Oct 7 '14 at 5:22

From <https://stackoverflow.com/questions/2084069/create-a-csv-file-with-values-from-a-python-list>

CSV in Python adding an extra carriage return, on Windows

One of the possible fixes in Python3, as described in @YiboYang's answer, is opening the file with the newline parameter set to be an empty string:

f = open(path_to_file, 'w', newline='')
writer = csv.writer(f)

From <https://stackoverflow.com/questions/3191528/csv-in-python-adding-an-extra-carriage-return-on-windows>

Examples of simple type checking in Python:

2019-02-17

Examples of simple type checking in Python:

assert type(variable_name) == int
assert type(variable_name) == bool
assert type(variable_name) == list

From <https://stackoverflow.com/questions/402504/how-to-determine-a-python-variables-type>

Use type

>>> type(one)
<type 'int'>
You can use the __name__ attribute to get the name of the object. (This is one of the few special attributes that you need to use the __dunder__ name to get to - there's not even a method for it in the inspect module.)
>>> type(one).__name__
'int'

From <https://stackoverflow.com/questions/402504/how-to-determine-a-python-variables-type>

isinstance()

With one argument, return the type of an object. The return value is a type object. The isinstance() built-in function is recommended for testing

From <https://docs.python.org/2/library/functions.html?highlight=type#type>

• Syntax:

isinstance(object, classinfo)

The isinstance() takes two parameters:

object : object to be checked
classinfo : class, type, or tuple of classes and types

From <https://www.geeksforgeeks.org/type-isinstance-python/>

graph-cli

2019-01-05

graph-cli

A CLI utility to create graphs from CSV files. graph-cli is designed to be highly configurable for easy and detailed graph generation. It has many flags to acquire this detail and uses reasonable defaults to avoid bothering the user. It also leverages chaining, so you can create complex graphs from multiple CSV files.

From <https://github.com/mcastorina/graph-cli?utm_source=hackernewsletter&utm_medium=email&utm_term=data>

copy2

2018-12-25

copy2 As with the previous methods, copy2 method is identical to the copy method, but in addition to copying the file contents it also attempts to preserve all the source file's metadata. If the platform doesn't allow for full metadata saving, then copy2 doesn't return failure and it will just preserve any metadata it can. The syntax is as follows:

shutil.copy2(src_file, dest_file, *, follow_symlinks=True)

From <https://stackabuse.com/how-to-copy-a-file-in-python/>

Start of String Only: \A

Start of String Only: \A The \A anchor specifies that a match must occur at the beginning of the input string. It is identical to the ^ anchor, except that \A ignores the RegexOptions.Multiline option. Therefore, it can only match the start of the first line in a multiline input string.

From <https://docs.microsoft.com/en-us/dotnet/standard/base-types/anchors-in-regular-expressions>

Decimals interact well with much of the rest of Python

decimal — Decimal fixed point and floating point arithmetic

From <https://docs.python.org/3/library/decimal.html>

Unlike hardware based binary floating point, the decimal module has a user alterable precision (defaulting to 28 places) which can be as large as needed for a given problem:

>>> from decimal import *
>>> getcontext().prec = 6
>>> Decimal(1) / Decimal(7)
Decimal('0.142857')
>>> getcontext().prec = 28
>>> Decimal(1) / Decimal(7)
Decimal('0.1428571428571428571428571429')

From <https://docs.python.org/3/library/decimal.html>

Decimals interact well with much of the rest of Python. Here is a small decimal floating point flying circus:

>>> data = list(map(Decimal, '1.34 1.87 3.45 2.35 1.00 0.03 9.25'.split()))
>>> max(data)
Decimal('9.25')
>>> min(data)
Decimal('0.03')
>>> sorted(data)
[Decimal('0.03'), Decimal('1.00'), Decimal('1.34'), Decimal('1.87'),
 Decimal('2.35'), Decimal('3.45'), Decimal('9.25')]

From <https://docs.python.org/3/library/decimal.html>

splitting a number into the integer and decimal parts

>>> a = 147.234
>>> a % 1
0.23400000000000887
>>> a // 1
147.0
>>>

If you want the integer part as an integer and not a float, use int(a1) instead. To obtain the tuple in a single passage: (int(a1), a%1) EDIT: Remember that the decimal part of a float number is approximate, so if you want to represent it as a human would do, you need to use the decimal library

From <https://stackoverflow.com/questions/6681743/splitting-a-number-into-the-integer-and-decimal-parts>

import math
x = 1234.5678
math.modf(x) # (0.5678000000000338, 1234.0)

From <https://stackoverflow.com/questions/6681743/splitting-a-number-into-the-integer-and-decimal-parts>

Create a date object:

import datetime

x = datetime.datetime(2020, 5, 17)

From <https://www.w3schools.com/python/python_datetime.asp>

Module datetime provides

Module datetime provides classes for manipulating date and time in more object oriented way. One of them is datetime.datetime.now which return number of seconds since the epoch.

import datetime;
ts = datetime.datetime.now().timestamp()
print(ts)
# 1545665588.52

From <http://timestamp.online/article/how-to-get-current-timestamp-in-python>

x = int(datetime.datetime(2070, 12, 13, 1, 48, 35).timestamp() - datetime.datetime.now().timestamp()//1)

print(x)

Example 2: Right justify string and fill the remaining spaces

# example string
string = 'cat'
width = 5
fillchar = '*'
# print right justified string
print(string.rjust(width, fillchar))

From <https://www.programiz.com/python-programming/methods/string/rjust>

Practical Business Python

2018-12-24

Practical Business Python

pbpython/extras/Pathlib-Cheatsheet.pdf

From <https://github.com/chris1610/pbpython/blob/master/extras/Pathlib-Cheatsheet.pdf>

The divmod() returns

2018-12-23

The divmod() returns

• (q, r) - a pair of numbers (a tuple) consisting of quotient q and remainder r

From <https://www.programiz.com/python-programming/methods/built-in/divmod>

numpy

2018-11-18

pip install numpy

From <https://pypi.org/project/numpy/>

pip3.6 install numpy
pip3.6 install scipy
pip3.6 install matplotlib
pip3.6 install opencv

Install opencv-python instead of cv2.
pip install opencv-python

From <https://github.com/jazzsaxmafia/video_to_sequence/issues/3>

compare the use of lambda

We can compare the use of lambda with that of def to create a function. adder_lambda = lambda parameter1,parameter2: parameter1+parameter2 def adder_regular(parameter1, parameter2): return parameter1+parameter2

From <https://stackoverflow.com/questions/8966538/syntax-behind-sortedkey-lambda>

Key Functions

Key Functions Both list.sort() and sorted() have a key parameter to specify a function to be called on each list element prior to making comparisons. For example, here’s a case-insensitive string comparison:

sorted(“This is a test string from Andrew”.split(), key=str.lower)

['a', 'Andrew', 'from', 'is', 'string', 'test', 'This'] The value of the key parameter should be a function that takes a single argument and returns a key to use for sorting purposes. This technique is fast because the key function is called exactly once for each input record.

From <https://docs.python.org/3/howto/sorting.html>

This image was created with the following code.

1	import operator                                                  
2	import pylab
3	from easydev import Timer
4	 
5	times1, times2, times3, times4 = [], [], [], []
6	pylab.clf()
7	d = {"Pierre": 42, "Anne": 33, "Zoe": 24}
8	for j in range(20):
9	    N = 1000000
10	    with Timer(times3):
11	        for i in range(N):
12	         sorted_d = sorted((key, value) for (key,value) in d.items())
13	    with Timer(times2):
14	        for i in range(N):
15	            sorted_d = sorted(d.items(), key=lambda x: x[1])
16	    with Timer(times1):
17	        for i in range(N):
18	            sorted_d = sorted(d.items(), key=operator.itemgetter(1))
19	    with Timer(times4):
20	        for i in range(N):
21	            sorted_d = [(k,v) for k,v in d.items()]
22	    print(j)
23	pylab.boxplot([times1, times2, times3, times4])
24	pylab.xticks([1,2,3,4], ["operator", "lambda", "list comprehension and lambda", "py36"])
25	pylab.ylabel("Time (seconds) 1 million sorting \n (repeated 20 times)")
26	pylab.grid()
27	pylab.title("Performance sorted dictionary by values")

From <http://thomas-cokelaer.info/blog/2017/12/how-to-sort-a-dictionary-by-values-in-python/>

As already said, iteritems() will be a problem, but you mention a syntax error, which comes from the lambda declaration with parenthesis: Change:

key=lambda(k, v): sort_order.index(k)

To:

key=lambda k, v: sort_order.index(k)

From <https://stackoverflow.com/questions/47749438/lambda-sorting-in-python-3>

What problem does pandas solve?

2018-11-15

What problem does pandas solve? Python has long been great for data munging and preparation, but less so for data analysis and modeling. pandas helps fill this gap, enabling you to carry out your entire data analysis workflow in Python without having to switch to a more domain specific language like R.

From <https://pandas.pydata.org/>

NumPy

NumPy is the fundamental package for scientific computing with Python. It contains among other things:
• a powerful N-dimensional array object
• sophisticated (broadcasting) functions
• tools for integrating C/C++ and Fortran code
• useful linear algebra, Fourier transform, and random number capabilities

From <http://www.numpy.org/> Array Broadcasting Broadcasting is the name given to the method that NumPy uses to allow array arithmetic between arrays with a different shape or size.

From <https://machinelearningmastery.com/broadcasting-with-numpy-arrays/>

scikit-learn

scikit-learn Machine Learning in Python • Simple and efficient tools for data mining and data analysis • Accessible to everybody, and reusable in various contexts • Built on NumPy, SciPy, and matplotlib • Open source, commercially usable - BSD license

From <https://scikit-learn.org/stable/>

Welcome to PyBrain PyBrain is a modular Machine Learning Library for Python. Its goal is to offer flexible, easy-to-use yet still powerful algorithms for Machine Learning Tasks and a variety of predefined environments to test and compare your algorithms.

From <http://pybrain.org/>

python read fails on special characters

2018-11-06

python read fails on special characters

with io.open(fileToSearch,'r',encoding='utf-8') as file:

From <https://stackoverflow.com/questions/47635759/how-to-read-a-text-file-with-special-characters-in-python>

An unrelated hint: have a look at the built-in function enumerate, which frees you from taking care of incrementing counter: You simply write for counter, line in enumerate(file):

From <https://stackoverflow.com/questions/47635759/how-to-read-a-text-file-with-special-characters-in-python>

idle args

2018-10-31

A number of IDEs support menu options to set the execution environment for programs under development and testing. In particular, it would be nice if IDLE let the user set command line arguments to be passed into sys.argv when running a script by pressing F5. Here are some existing implementations for reference: * Wing-IDE: https://wingware.com/doc/intro/tutorial-debugging-launch * Visual Studio: https://www.youtube.com/watch?v=IgbQCRHKV-Y * PyCharm: https://www.jetbrains.com/pycharm/help/run-debug-configuration-python.html This feature will help users interactively develop and test command-line tools while retaining all the nice features of the IDE. I would personally find it useful when teaching students about how sys.argv works.

From <https://bugs.python.org/issue5680>

Pending application of a patch, the following will work to only add args to sys.argv when running from an Idle editor.

import sys
# ...
if __name__ == '__main__':
    if 'idlelib.PyShell' in sys.modules:
        sys.argv.extend(('a', '-2'))  # add your argments here.
    print(sys.argv)  # in use, parse sys.argv after extending it
    # ['C:\\Programs\\python34\\tem.py', 'a', '-2']

From <https://bugs.python.org/issue5680> 

try:
    __file__
except:
    sys.argv = [sys.argv[0], 'argument1', 'argument2', 'argument2']

From <https://stackoverflow.com/questions/2148994/when-running-a-python-script-in-idle-is-there-a-way-to-pass-in-command-line-arg>

Auto detect IDLE and prompt for command-line argument values

2018-10-31

Auto detect IDLE and prompt for command-line argument values

c#! /usr/bin/env python3
import sys
def ok(x=None):
      sys.argv.extend(e.get().split())
      root.destroy()
if 'idlelib.rpc' in sys.modules:
import tkinter as tk
root = tk.Tk()
      tk.Label(root, text="Command-line Arguments:").pack()
e = tk.Entry(root)
      e.pack(padx=5)
tk.Button(root, text="OK", command=ok,
                default=tk.ACTIVE).pack(pady=5)
root.bind("<Return>", ok)
      root.bind("<Escape>", lambda x: root.destroy())
e.focus()

From <https://stackoverflow.com/questions/2148994/when-running-a-python-script-in-idle-is-there-a-way-to-pass-in-command-line-arg/44687632#44687632>

print the files deleted

2018-10-30 Python Script Here's a Python script that will also print the files deleted

import os
for line in open("./data/deleted.files"):
    if line.isspace() or line[0] == '#':
        continue
    line = line.rstrip(os.linesep)
    try:
        if os.path.exists(line):
            print('File removed =>  ' + line)
            os.remove(line)
    except OSError:
        pass

delete directories

Here's an alternative Python script that is case sensitive and will also delete directories included in the list

import os
import shutil
 
def exists_casesensitive(path):
    if not os.path.exists(path):
        return False
    directory, filename = os.path.split(path)
    return filename in os.listdir(directory)
 
with open("./data/deleted.files") as file:
    for line in file:
        line = line.strip()
        if line and not line.startswith('#'):
            path = line.rstrip(os.linesep)
            if exists_casesensitive(path):
                if os.path.isdir(path):
                    shutil.rmtree(path)
                    print('Directory removed =>  ' + path)
                else:
                    os.remove(path)
                    print('File removed =>  ' + path)
            else:
                #print('File not found => ' + path)
                pass

From <https://www.dokuwiki.org/install:unused_files>

checkpoints

2018-10-06 GLOB

def delete_previous_checkpoints(self, num_previous=5):
        """
        Deletes all previous checkpoints that are <num_previous> before the present checkpoint.
        This is done to prevent blowing out of memory due to too many checkpoints
        
        :param num_previous:
        :return:
        """
        self.present_checkpoints = glob.glob(self.get_checkpoint_location() + '/*.ckpt')
        if len(self.present_checkpoints) > num_previous:
            present_ids = [self.__get_id(ckpt) for ckpt in self.present_checkpoints]
            present_ids.sort()
            ids_2_delete = present_ids[0:len(present_ids) - num_previous]
            for ckpt_id in ids_2_delete:
                ckpt_file_nm = self.get_checkpoint_location() + '/model_' + str(ckpt_id) + '.ckpt'
                os.remove(ckpt_file_nm)

From <https://www.programcreek.com/python/example/92/glob.glob>

argparse

2018-10-06

If you're doing anything more complicated than a script that accepts a few required positional arguments, you'll want to use a parser. Depending on your python version, there are 3 available in the python standard library (getopt, optparse and argparse) and argparse is by far the best.

From <https://stackoverflow.com/questions/35365344/python-sys-argv-and-argparse>

Argparse Tutorial

Argparse Tutorial author: Tshepang Lekhonkhobe This tutorial is intended to be a gentle introduction to argparse, the recommended command-line parsing module in the Python standard library.

From <https://docs.python.org/3.7/howto/argparse.html>

*args and **kwargs in Python
*args

The special syntax *args in function definitions in python is used to pass a variable number of arguments to a function. It is used to pass a non-keyworded, variable-length argument list.

From <https://www.geeksforgeeks.org/args-kwargs-python/>

recursive

2018-10-03

In Python 3.5 and newer use the new recursive / functionality: configfiles = glob.glob('C:/Users/sam/Desktop/file1//*.txt', recursive=True) When recursive is set, followed by a path separator matches 0 or more subdirectories. From <https://stackoverflow.com/questions/14798220/how-can-i-search-sub-folders-using-glob-glob-module-in-python> I have successfully used <code> for i in d.rglob('/*'): </code>

for i in d.iglob('**/*'):

The “**” pattern means “this directory and all subdirectories, recursively”. In other words, it enables recursive globbing:

From <https://docs.python.org/3/library/pathlib.html>

errno.ENOTEMPTY
	Directory not empty

From <https://docs.python.org/2/library/errno.html>

errno.EACCES¶
	Permission denied

From <https://docs.python.org/2/library/errno.html>

except OSError as e:	
	if e.errno not in _IGNORED_ERROS:
	raise
	return False

From <https://github.com/python/cpython/blob/3.7/Lib/pathlib.py>

except OSError as e:	
	if e.errno != EINVAL and strict:
	raise

From <https://github.com/python/cpython/blob/3.7/Lib/pathlib.py>

walktree

2018-10-01

import os, sys from stat import * def walktree(top, callback):

  '''recursively descend the directory tree rooted at top,
     calling the callback function for each regular file'''

for f in os.listdir(top):

      pathname = os.path.join(top, f)
      mode = os.stat(pathname).st_mode
      if S_ISDIR(mode):
          # It's a directory, recurse into it
          walktree(pathname, callback)
      elif S_ISREG(mode):
          # It's a file, call the callback function
          callback(pathname)
      else:
          # Unknown file type, print a message
          print('Skipping %s' % pathname)

def visitfile(file):

  print('visiting', file)

if name == 'main':

  walktree(sys.argv[1], visitfile)

From <https://docs.python.org/3/library/stat.html>

Dropbox in python

2018-09-29 Dropbox in python

from pathlib import Path
import arrow
filesPath = r"C:\scratch\removeThem"
criticalTime = arrow.now().shift(hours=+5).shift(days=-7)
for item in Path(filesPath).glob('*'):
    if item.is_file():
        print (str(item.absolute()))
        itemTime = arrow.get(item.stat().st_mtime)
        if itemTime < criticalTime:
            #remove it
            pass

From <https://stackoverflow.com/questions/12485666/python-deleting-all-files-in-a-folder-older-than-x-days>

In IDLE, go to Options → Configure IDLE → Keys and there select history-next and then history-previous to change the keys. Then click on Get New Keys for Selection and you are ready to choose whatever key combination you want.

From <https://stackoverflow.com/questions/4289937/how-to-repeat-last-command-in-python-interpreter-shell>

CSV Toolkit Overview

2018-09-04

CSV Toolkit Overview NOTE: THIS PROJECT HAS SINCE BEEN FORKED TO THE INTERNAL PROMETHEUS RESEACH, LLC TOOL PROPS.CSVTOOLKIT CSV Toolkit is a Python package that provides validation tooling and processing of CSV files. The validation tooling is based on the fantastic package Vladiate. The interface and extension mechanisms are similarly implemented as the rex.core extension mechanisms.

From <https://pypi.org/project/csv.toolkit/>

What is Bonobo?

What is Bonobo? Bonobo is a lightweight Extract-Transform-Load (ETL) framework for Python 3.5+. It provides tools for building data transformation pipelines, using plain python primitives, and executing them in parallel. Bonobo is the swiss army knife for everyday's data. From <https://www.bonobo-project.org/>

csvkit is a suite of command-line tools for converting to and working with CSV, the king of tabular file formats.

From <https://csvkit.readthedocs.io/en/1.0.3/>

Awesome Python

Awesome Python A curated list of awesome Python frameworks, libraries, software and resources. Inspired by awesome-php.

• Awesome Python
	○ Admin Panels
	○ Algorithms and Design Patterns
	○ Anti-spam
	○ Asset Management
	○ Audio
	○ Authentication
	○ Build Tools
	○ AND MANY MORE

From <https://github.com/vinta/awesome-python>

Python data visualization: Comparing 7 tools

2018-09-04

Python data visualization: Comparing 7 tools The Python scientific stack is fairly mature, and there are libraries for a variety of use cases, including machine learning, and data analysis. Data visualization is an important part of being able to explore data and communicate results, but has lagged a bit behind other tools such as R in the past.

From <https://www.dataquest.io/blog/python-data-visualization-libraries/>

Best way to sort txt file using csv tools in python

2018-09-04

Best way to sort txt file using csv tools in python

From <https://stackoverflow.com/questions/45221637/best-way-to-sort-txt-file-using-csv-tools-in-python>

import csv
import operator
#==========Search by ID number. Return Just the Name Fields for the Student
with open("studentinfo.txt","r") as f:
  studentfileReader=csv.reader(f)
  id=input("Enter Id:")
  for row in studentfileReader:
    for field in row:
      if field==id:
        currentindex=row.index(id)
        print(row[currentindex+1]+" "+row[currentindex+2])
#=========Sort by Last Name
with open("studentinfo.txt","r") as f:
  studentfileReader=csv.reader(f)
  sortedlist=sorted(f,key=operator.itemgetter(0),reverse=True)
  print(sortedlist)

From <https://stackoverflow.com/questions/45221637/best-way-to-sort-txt-file-using-csv-tools-in-python>

2018-08-10 The sys.path list contains the list of directories which will be searched for modules at runtime:

python -v
>>> import sys
>>> sys.path
['', '/usr/local/lib/python25.zip', '/usr/local/lib/python2.5', ... ]

From <https://stackoverflow.com/questions/269795/how-do-i-find-the-location-of-python-module-sources>

For speedtest - /usr/local/lib

Table of Contents

Python

How to efficiently parse fixed width files?

Python 3 always stores text strings as sequences of Unicode code points.

UCS-2 is UTF-16

Atrribute errno

Cclasses Make an empty file called __init__.py

Python RegEx

Discovering millions of datasets

Python Cheatsheet

Illustrated Guide to Python 3

retrieve all groups for a specific domain

APIs & Services

Algorithms

You can still miss attachments

the Gmail API

Access Dates

Sort a Dictionary

Plotly Cufflinks

Parsing text with Python

DataFrames

Backblaze

Plotly

Regular Expression

Split the string at the last occurrence of sep

The built-in os module has a number of useful functions

Splitting, Concatenating, and Joining Strings in Python

Regex Testor

processdokuwikifile

PASS BY OBJECT REFERENCE (Case in python):

Plotly

Python write to CSV

CSV in Python adding an extra carriage return, on Windows

Examples of simple type checking in Python:

isinstance()

graph-cli

copy2

Start of String Only: \A

Decimals interact well with much of the rest of Python

splitting a number into the integer and decimal parts

Module datetime provides

Example 2: Right justify string and fill the remaining spaces

Practical Business Python

The divmod() returns

numpy

compare the use of lambda

Key Functions

What problem does pandas solve?

scikit-learn

python read fails on special characters

idle args

Auto detect IDLE and prompt for command-line argument values

print the files deleted

delete directories

checkpoints

argparse

Argparse Tutorial

recursive

walktree

Dropbox in python

CSV Toolkit Overview

What is Bonobo?

Awesome Python

Python data visualization: Comparing 7 tools

Best way to sort txt file using csv tools in python

Cclasses Make an empty file called init.py