Author Topic: Data proofer checks a dataset for errors or potential mistakes.  (Read 442 times)

0 Members and 1 Guest are viewing this topic.

Software Santa

  • Administrator
  • *****
  • Join Date: Dec 2006
  • Posts: 5236
  • Operating System:
  • Linux (Ubuntu) Linux (Ubuntu)
  • Browser:
  • Firefox 73.0 Firefox 73.0
Data proofer checks a dataset for errors or potential mistakes.
« on: February 21, 2020, 08:51:43 PM »
Data proofer is built to automate this process of checking a dataset for errors or potential mistakes.

Quote
Dataproofer

A proofreader for your data. Currently in beta.


Every day, more and more data is created. Journalists, analysts, and data visualizers turn that data into stories and insights.

But before you can make use of any data, you need to know if it’s reliable. Is it weird? Is it clean? Can I use it to write or make a viz?

This used to be a long manual process, using valuable time and introducing the possibility for human error. People can’t always spot every mistake every time, no matter how hard they try.

Data proofer is built to automate this process of checking a dataset for errors or potential mistakes.
Getting Started (Desktop)

Download a .zip of the latest release from the Dataproofer releases page.

Drag the app into your applications folder.

Select your dataset, which can be either a CSV on your computer, or a Google Sheet that you’ve published to the web.

Once you select your dataset, you can choose which suites and tests run by turning them on or off.

Proof your data, get your results, and feel confident about your dataset.

Test Suites
Information & Diagnostics

A set of tests that infer descriptive information based on the contents of a table's cells.

    Check for numeric values in columns
    Check for strings in columns

Core Suite

A set of tests related to common problems and data checks — namely, making sure data has not been truncated by looking for specific cut-off indicators.

    Check for duplicate rows
    Check for empty columns (no values)
    Check for special, non-typical Latin characters/letters in strings
    Check for big integer cut-offs as defined by MySQL and PostgreSQL, common database programs
    Check for integer cut-offs as defined by MySQL and PostgreSQL, common database programs
    Check for small integer cut-offs as defined by MySQL and PostgreSQL, common database programs
    Check for whether there are exactly 65k rows — an indication there may be missing rows lost when the data was exported from a database
    Check for strings that are exactly 255 characters — an indication there may be missing data lost when the data was exported from MySQL

Geo Suite

A set of tests related to common geographic data problems.

    Check for invalid latitude and longitude values (values outside the range of -180º to 180º)
    Check for void latitude and longitude values (values at 0º,0º)

Stats Suite

A set of test related to common statistical used to detect outlying data.

    Check for outliers within a column relative to the column's median
    Check for outliers within a column relative to the column's mean

http://dataproofer.org/

https://github.com/dataproofer/Dataproofer#getting-started

 

Software Santa first opened on January 1st, 2007
Now celebrating 16 Years of being a Digital Santa Claus!
Software Santa's Speedy Site is Proudly Hosted by A2 Hosting.

Welcome Visitor:





@MEMBER OF PROJECT HONEY POT
Spam Harvester Protection Network
provided by Unspam



Software Santa Welcome Page

The Software Santa Privacy Policy

email