Carnegie Mellon University

How to Use Data Quality Self Assessment

Complete your data quality (DQ) self-assessment using the web-based tool Informatica Intelligent Data Management Cloud (IDMC)®. Informatica IDMC can help you:

  • Combine data from a variety of different sources.
  • Understand irregularities and hidden problems in your data.
  • Identify DQ issues related to your business goals.
  • Measure and monitor DQ over time.

If you have questions or need assistance, contact the Data Governance Office at dgo@andrew.cmu.edu.

1. Get Started

Before you begin working with data, log in and learn how to navigate and manage projects and permissions within the Informatica IDMC platform.

If you haven’t already, email the Data Governance Office at dgo@andrew.cmu.edu to request access to Informatica IDMC.

  1. Visit the login URL provided to you by the Data Governance Office.
  2. Click Log in using Single Sign-On (SSO).
  3. On the SAML pop-up window, click continue.
  4. Log in with your Andrew userID and password, then authenticate with DUO when prompted to access the My Services page.

After you log in to Informatica IDMC, you arrive on My Services. My Services displays services you will use to complete your DQ Self-Assessment, as well as other services available to you and trial subscriptions.

To complete your DQ Self-Assessment, you will use the Data Profiling and Data Quality services.

Click on a service to begin working.

To return to My Services after opening a service, click the drop-down arrow next to the service name (e.g. Data Profiling) in the top left-hand corner of the page, near the Informatica logo.

You may update the following information in your user profile:

  • Personal Information (name, title, email, etc.)
  • Time zone (used to timestamp activity)
  • Password and Security Questions

To edit your user profile, click the User icon and select Profile.

When you finish editing, remember to click Save.

Note: You cannot access the User icon from My Services. Click in to a service before attempting to edit your user profile.

Projects contain folders that you can use to organize assets. Assets are any objects within the Informatica IDMC platform that you can define, such as profiles and scorecards.

Note: When you begin using Informatica IDMC, the Data Governance Office will create your project and grant you access. At least one member of your project team will have read, write, and grant permissions. If you can’t access a project, folder, or asset you need to accomplish your work, request access from the team member who has grant access.

  1. From Data Profiling > Explore > All Projects, click New Project.
  2. Enter a Name and Description, and click Save.

Your new project appears on the Explore page under All Projects.

  1. From Data Profiling > Explore > All Projects, open an existing project.
  2. Click New Folder.
  3. Enter a Folder Name, and click Save.

Copy projects, folders, or assets to use a template later or to create a backup copy.

  1. From Data Profiling > Explore > All Projects, locate the project, folder, or asset you want to copy.
    Note: Click a project name to access the folders and assets it contains.
  2. Click the checkbox in the row that contains the object you want to copy.
  3. Right click, then click Actions and select Copy To.
  4. Select the folder you wish to copy the object to.

Note: You must have write permissions on a folder to copy an object to it. Review Manage Roles and Permissions for more information.

  1. From Data Profiling > Explore > All Projects, locate the folder or asset you want to copy.
    Note: Click a project name to access the folders and assets it contains.
  2. If source control is enabled, check out the folder or assets you want to move.
    Note: If you want to move an entire folder, you must check out each asset within it.
  3. Click the checkbox in the row that contains the object you want to copy.
  4. Right click, then click Actions and select Move To.
  5. Select the folder you want to move the object to.
    Note: You must have write permissions on a folder to move an object to it.
  6. If you checked out the object(s), check them in to reflect the new structure in the Git repository.

You can rename projects, folders, and assets without breaking their connection to other objects.

  1. From Data Profiling > Explore > All Projects, locate the project, folder, or asset you want to rename.
  2. Click the checkbox in the row that contains the object you want to copy.
  3. Right click, then click Actions and select Rename.
  4. Rename the object and click Save.

Check with your collaborators before deleting anything to ensure you are not interrupting any current or future work.

  1. From Data Profiling > Explore > All Projects, locate the project, folder, or asset you want to delete.
  2. If source control is enabled, check out the folder or assets you want to move.
  3. In the row that contains the object you want to delete, right click and select Delete.
  4. If you checked out the object(s), check them in to reflect the new structure in the Git repository.

A tag is a property you can apply to an asset. Tags allow you to filter for assets that share common attributes.

You can assign multiple tags to a single asset.

Note: You cannot tag projects or folders.

Assign a Tag

Assign a tag to one asset at a time, or tag multiple assets simultaneously.

  1. From Data Profiling > Explore > All Projects, locate the asset you want to tag.
  2. If source control is enabled, check out the assets you want to tag.
  3. Right click on the asset and select Properties.
    Note: If you want to tag multiple assets simultaneously, click the checkboxes in the rows with the assets you want to tag.
  4. Assign an existing tag from the drop-down or enter a new tag.

Edit or Delete a Tag

  1. From Data Profiling > Explore > All Projects, click the drop-down next to Explore.
  2. Select Tags.
  3. Right click a tag to edit or delete it.

The Data Governance Office will grant at least one member of your project team an administrator role, which allows create, read, update, delete, run, and set permissions for all assets in a project.

Administrators may adjust roles and permissions for individual users or groups of users.

To learn more about the different roles and permissions available in Informatica IDMC, visit Permissions and Privileges.

  1. From Data Profiling > Explore, click the drop-down next to Explore.
  2. Select Projects and Folders.
  3. Right click on a project or folder and select Permissions.
  4. Use the checkboxes to change permissions for a user.
  1. From Data Profiling > Explore, click the drop-down next to Explore.
  2. Select Projects and Folders.
  3. Right click on a project or folder and select Permissions.
  4. Click Add.
  5. Select the user(s) you wish to add.
  6. Click Add.
  7. From Permissions, use the checkboxes to set permissions for the new user(s).
  8. Click Save.
  1. From Data Profiling > Explore, click the drop-down next to Explore.
  2. Select Projects and Folders.
  3. Right click on a project or folder and select Permissions.
  4. Select the user(s) you want to remove by click the checkbox next to their name.
  5. Click Remove.
  6. Click Save.

2. Understand Your Data

Now that you’ve set up your project, you can begin to understand the data you’re working with. Informatica IDMC analyzes your data, or source object, using profiles and queries, which can help you find data quality issues related to completeness, conformity, and more.

Connect source objects and run profiles and queries from Data Profiling.

Profiles, or data profiling tasks, help you understand the quality of your data by locating and analyzing irregularities and hidden problems across sources. When you run a profile, you can assess several aspects of the data, including completeness, conformity, and consistency.

  1. From Data Profiling > Explore > All Projects, click New (+).
  2. Click Data Profiling Task.
  3. Enter Asset Details.
    1. Enter a Name.
    2. (Optional) Enter a Description.
    3. Click Browse to select a Location.
      Note: The location is the project in which you will save the profile.
  4. Enter Source Details.
    1. Use the drop-down to select a Connection.
      Note: The Data Governance Office creates connections when they create your project.
    2. Click Select and to open the Select a Source Object window.
    3. Click one of the Source Objects associated with that connection, and then click Select.
    4. Click Formatting Options.
    5. Select the Delimiter and Text Qualifier that best matches your data.
    6. (Optional) Enter an Escape Character.
      Note: Data Profiling reads the delimiter or text qualifier as a regular character when you specify an escape character. Escape Characters are characters within your data that adhere to one of the following guidelines. 
      • They immediately precede a column delimiter character that is embedded within an unquoted string.
      • They immediately precede the quote characters in a quoted string.
    7. Select the Field Labels option that best matches your data.
      • If your data contains column labels, select Import from Row and enter the row number that contains the column labels.
      • If your data does not contain column labels, select Auto-generate.
      • Enter the row number that contains the First Data Row.Select the Field Labels option that best matches your data.

      Note: See Formatting Examples for sample use cases.

  5. Adjust Profile Settings.
    1. Run Profile on:
      • Select All rows for complete analysis if you connected to a Flat File source object.
      • Select First n rows if you connected to a Database source object and enter the number of rows you want to profile.
    2. Drill down:
      • Select Yes if you want to run queries on the source object after you run the profile.
      • Select No if you do not.
  6. Below Asset and Source details, select the Columns from your data that you want to profile.
    Note: When connecting to a flat file source object, you may change the Native Data Type, Precision, and Scale for the columns that you select.
  7. (Optional for Flat File source objects) Click (Override Column Metadata) to override any column definition.
    1. Click checkbox next to the column(s) you want to modify.
    2. To change the Native Data Type, click the dropdown and select an alternative.
      1. If you modify the Native Data Type to a string, you can change the Precision. 
      2. If you modify the Native Data Type to a number, you can change the Precision and Scale.
      3. If you modify the Native Data Type to a bigint, double, int, or number, you cannot change the Precision or Scale.
    3. Adjust the Precision and Scale as desired.
    4. Click Apply.
  8. Click Save to save your profile.

At this point, you may begin another task, end your session, or click Run to begin your profile.


Formatting Examples


Example 1

Formatting Options

Delimiter

Comma

Text Qualifier

Double Quotes

Escape Character

\

Source Data

“name”,”home-city”,”home-state”,”enrollment-date”
“Doe, John”,”Pittsburgh”,”PA”,
“Smith, Jane”,”Tuscon”,”AZ”
“Green, Gerald \”Jerry””,”Denver”,”CO”

Interpreted As

name

home-city

home-state

Doe, John

Pittsburgh

PA

Smith, Jane

Tuscon

AZ

Green, Gerald “Jerry”

Denver

CO


Example 2

Formatting Options

Delimiter

Comma

Text Qualifier

None

Escape Character

\

Source Data

name,home-city,home-state,enrollment-date
Doe\, John,Pittsburgh,PA
Smith\, Jane,Tuscon,AZ
Green\, Gerald “Jerry”,Denver,CO

Interpreted As

name

home-city

home-state

Doe, John

Pittsburgh

PA

Smith, Jane

Tuscon

AZ

Green, Gerald “Jerry”

Denver

CO

  1. From Data Profiling, click My Jobs to view all your profile runs.
  2. Search the Instance Name column for the profile you want to monitor and click its name.
  3. Double-click the profile name in the Asset Name field.

View your profile results from the Results tab. 

The Data Preview section includes the first 10 rows and all columns in your source object. Checkmarks indicate that Data Profiling supports the data in that column.

The Profile Scope displays the number of rows that the profile assessed.

Note: To view the Data Preview section, you must have the Data Profiling - Data Preview privilege. Review Manage Roles and Permissions for more information.

You may export profile results to a Microsoft Excel file.

  1. From Data Profiling, click My Jobs to view all your profile runs.
  2. Search the Instance Name column for the profile you want to monitor and click its name.
  3. Double-click the profile name in the Asset Name.
  4. In the Results tab, click the three-dot icon, and then click Export Profile Results.
  5. Enter a File name or use the default file name.
  6. Select a Range.
  7. Choose one ore more of the following Scope options:
    • Summary – exports all the profile results.
    • Value Fequency – exports only the value frequencies.
    • Statistics – exports only the statistics.
    • Patterns – exports only the patterns.
    • Data Types – exports only the documented and inferred data types.
  8. Leave the File Format as Microsoft Excel and the Code Page as 7-bit ASCII.
  9. Click Export.

As you review profile results, you may see values that you don’t understand. Consult CMU’s Data Catalog to review data definitions.

  1. From Data Profiling > Explore > All Projects, select the project containing the profile you want to run.
  2. Click the profile you want to run.
  3. Click Run.

Depending on the quantity of data you are assessing, it may take several minutes to run a profile. You can monitor current and recent profile runs from My Jobs.

  1. From Data Profiling, click My Jobs to view all your profile runs.
  2. Search the Instance Name column for the profile you want to monitor and click its name.

Review top-level information about the profile, or drill down into its subtasks to learn more.

From Data Profiling > My Jobs, locate the row that contains the profile that you want to stop or resume, and then click the stop or resume icon at the end of the row.

Filters allow you to specify a subset of values from a column within your data to profile. You can create multiple filters in a profile.

Create a Filter

  1. From Data Profiling > Explore > All Projects, select the project containing the profile to which you want to add a filter.
  2. Click the profile to which you want to add a filter.
  3. Below Asset and Source details, click the Filters tab.
  4. Click the plus sign.
  5. Enter a Name.
  6. (Optional) Enter a Description.
  7. Click the plus sign to enter a filter condition.
  8. Use the dropdowns to choose a column and operator.
  9. Enter a valid value.
  10. Click the plus sign to add additional filter conditions as desired.
  11. Click OK.

Add a Filter to a Profile run

You may add or remove filters each time you run a profile to generate different results.

  1. From Data Profiling > Explore > All Projects, select the project containing the profile to which you want to add a filter.
  2. Click the profile to which you want to add a filter.
  3. Below Asset and Source details, click the Filters tab.
  4. Select the Filters you want to apply.
  5. Click the checkbox next to Use in Profile.
  6. Click Save.

After you create and run a profile, you may want to edit the profile to generate different results. You can change the profile definition, add or remove filters, add or remove rules, and more.

  1. From Data Profiling, click My Jobs to view all your profile runs.
  2. Search the Instance Name column for the profile you want to monitor and click its name.
  3. Double-click the profile name in the Asset Name field.
  4. Click the Definitions tab.

From Definitions, you may edit the following options:

  • Asset details, such as the profile name and description.
  • Source details, such as the connection and source object.
  • Profile settings, such as the drill down options or rows profiled.
  • Fitlers run on the profile.
  • Rule specifications applied to the profile.
  • Schedule details, including runtime environment and notification options.

You can run a profile multiple times. Each time you run a profile, Data Profiling saves the results in the Informatica Intelligent Cloud Services repository.

  1. From Data Profiling, click My Jobs to view all your profile runs.
  2. Search the Instance Name column for the profile you want to monitor and click its name.
  3. Double-click the profile name in the Asset Name.
  4. In the Results tab, click the three-dot icon, and then click Choose Profile Run.
  5. Click a profile run, and then click Select.

You may want to compare the results from two profile runs.

  1. From Data Profiling, click My Jobs to view all your profile runs.
  2. Search the Instance Name column for the profile you want to monitor and click its name.
  3. Double-click the profile name in the Asset Name.
  4. In the Results tab, click the three-dot icon, and then click Compare Profile Runs.
  5. Select two profile runs and click Compare.

You can compare results in a profile run from two or more columns.

  1. From Data Profiling, click My Jobs to view all your profile runs.
  2. Search the Instance Name column for the profile you want to monitor and click its name.
  3. Double-click the profile name in the Asset Name.
  4. In the Results tab, click the three-dot icon, and then click Compare Columns.
  5. Select two or more columns by clicking the checkbox next to their names.
    Note: Enter a keyword in Find to search for a column.
  6. Click Compare.

When you delete a profile run, Data Profiling permanently deletes the results from the profiling warehouse.

  1. From Data Profiling, click My Jobs to view all your profile runs.
  2. Search the Instance Name column for the profile you want to monitor and click its name.
  3. Double-click the profile name in the Asset Name.
  4. In the Results tab, click the three-dot icon, and then click Delete Profile Runs.
  5. Select one or more profile runs by clicking the checkbox next to its name.
  6. Click Delete.
    Note: You can delete up to 50 profile runs at a time.
  7. Click Close.

After you run a profile, you can run queries on the source object review rows with data quality issues. You can query source objects based on field or column values, inferred patterns, data types, and rule outputs.

Note: You can only run queries on a profile if you selected Yes for Drill down under the Profile Settings. To adjust the Profile Settings, review Edit Subsequent Profile Runs.

To create a query, you must have the Data Profiling - Query - Create privilege. Review Manage Roles and Permissions for more information.

  1. From Data Profiling, click My Jobs to view all your profile runs.
  2. Search the Instance Name column for the profile you want to monitor and click its name.
  3. On the Results tab, click the Queries button located in the bottom right.
  4. Click the plus sign.
  5. Enter a Name.
  6. (Optional) Enter a Description.
  7. Click the plus sign to add conditions.
  8. Use the dropdowns to select a Column and Operator.
  9. Enter a valid Value(s).
  10. Use the plus sign to add additional query conditions as desired.
  11. Above the query conditions, use the dropdown to select one of the following options:
    1. Query meets All the following conditions.
    2. Query meets One of the following conditions.
  12. Click OK.

Data Profiling runs queries on the runtime environment associated with the connection(s) established for your project.

  1. From Data Profiling, click My Jobs to view all your profile runs.
  2. Search the Instance Name column for the profile you want to monitor and click its name.
  3. On the Results tab, click the Queries button located in the bottom right.
  4. Select one or more of the queries by clicking the checkbox next to its name.
  5. Click Run.
  6. Use the dropdown to choose a connection.
  7. Click Run.

Data Profiling generates query results files and saves them in the directory specified in the connection. If the profile associate with the query has rules, Data Profiling also generates a legend file which explains the content of the query results file.

Note: Each time you run a given query, you overwrite previous results in the results file.

  1. From Data Profiling, click My Jobs to view all your profile runs.
  2. Search the Instance Name column for the profile containing the query you want to view and click its name.
  3. On the Results tab, click the Queries button located in the bottom right.
  4. Select one or more of the queries by clicking the checkbox next to its name.
  5. Click Show.

Query results appear in the Data Preview section of Results tab.

When you delete a query, Data Profiling deletes the query from the profile but it does not delete the results file and legend file related to the query.

  1. From Data Profiling, click My Jobs to view all your profile runs.
  2. Search the Instance Name column for the profile containing the query you want to view and click its name.
  3. On the Results tab, click the Queries button located in the bottom right.
  4. Select one or more of the queries by clicking the checkbox next to its name.
  5. Click the trash can icon.

3. Align DQ with Business Goals

Rules, or rule specifications, represent data requirements of a business need. You can create rules to identify data quality issues that prevent your source data from conforming to the criteria specified by your business need. For example, you could use a rule to verify that all zip code data in your source is numeric data.

To create rules, use Data Quality.

To apply rules to a profile, use Data Profiling.

  1. From My Services > Data Quality, click + New.
  2. Click the Rule Specification asset.

Rule Specification contains two tabs:

  • Definition – used to define the name and location of the rule specification.
  • Configuration – used to configure the rule specification logic.
  1. Click the Definition tab.
  2. Enter a Name.
  3. (Optional) Enter a Description.
    Note: We recommend entering a summary of the business need motivating your rule.
  4. Click Browse and select a location to save your rule specification.
    Note: Because you are creating the asset, ignore the Asset References fields. A new asset contains no asset references.
  5. (Optional) Select a Data Quality Dimension to represent the purpose of the rule specification.
  6. Click Save.
    Note: Data Quality will fill in the values in the Asset References field with the date you created the rule specification and your name.
  7. (Optional) Add a Tag to the rule specification.
  1. Click the Configuration tab.
    Note: Data Quality displays a warning that the configuration is incomplete.
  2. Under Properties: PrimaryRuleSet, configure a rule statement.
    Note: The primary rule set defines the data output from the rule specification.
  3. Click Primary Rule Set, and then click General.
  4. Update the Rule Set Name.
  5. Click the plus sign to add rule sets.
  6. Click Inputs to add one or more inputs to each rule set.
  7. Click Manage Inputs.
  8. Click + Add Input.
  9. Enter the name, data type, maximum length, scale, and description.
  10. Click OK.
  11. To add a rule statement that the rule specification requires:
    1. Click Rule Logic.
    2. Build you rule definition by entering the input, operator, condition, and action.
  12. Click Save.
  1. Click the Configuration tab.
  2. Click Test.
  3. Enter sample data.
  4. Click Run Test.
  1. From Data Quality > Asset Type, click Rule Specification.
  2. Click name of the rule specification you want to edit.
  3. Edit the rule specification.
  4. Click Save.

You can add one or more rules to a profile. Add rules from the Data Profiling service.

Note: Review create a rule specification before you add rules to a profile.

  1. From My Services > Data Profiling > Explore > All Projects, and open your project.
  2. Open a Profile.
  3. Click the Rules tab.
  4. Click the plus sign.
  5. Under Name, choose a rule specification.
  6. Click Select.
  7. Perform one of the following actions:
    1. If the profile uses a single input, choose one or more columns for the input rule.
    2. If the profile uses multiple inputs, choose one column for each input.
  8. Click OK.
  9. Click the plus sign to add additional rule specifications as desired.
  10. Click Save.

4. Monitor DQ Over Time

After you add rule specifications to a profile, Informatica IDMC generates scorecards that you can use to monitor data quality.

  1. From My Services > Data Profiling > Explore > All Projects, and open your project.
  2. Click the Metrics tab.
  3. Click the three-dot icon, and then click Select Scorecard metrics.
  4. Click the plus sign.
  5. Click your rule set, and then click Next.
  6. Set your rule occurrence thresholds.
  7. Click OK.
  8. Click Save.
  1. From My Services > Data Profiling > Explore > All Projects, and open your project.
  2. Click the Metrics tab.
  3. Click View Scorecard.

Informatica IDMC displays scorecards in the Cloud Data Governance and Catalog.

After you create a profile, add rule specifications, and set up scorecard metrics, you may want to schedule regular profile runs to help monitor data quality over time.

  1. From Data Profiling, click My Jobs to view all your profile runs.
  2. Search the Instance Name column for the profile you want to monitor and click its name.
  3. Double-click the profile name in the Asset Name field.
  4. Click the Definitions tab.

From Definitions, you may edit the run schedule.